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Neural Networks 


XViii 


Neural networks are composed of simple elements operating in parallel. These 
elements are inspired by biological nervous systems. As in nature, the network 
fanction is determined largely by the connections between elements. We can 
train aneural network to perform aparticular function by adqjusting the values 
of the connections (weights) between elements. 


Commonly neural networks are adjusted, or trained, sothat aparticular input 
leads to a specific target output. Such a Situation is shown below. There, the 
network is adqjusted, based on a comparison ofthe output and the target, until 
the network output matches the target.Typically many such input/target pairs 
are used, in this speruvisea /earPzmgs, to train a network. 


Target 






Neural Network 

including connections Compare 

(called weights) 站 
Input between neurons 







Adjust 
welghts 


Batch training ofanetwork proceeds bymaking weight and bias changes based 
on an entire set (batch) ofinput vectors. Incremental training changes the 
weights and biases ofanetwork as needed after presentation ofeach individual 
input vector. Incremental training is Sometimes referred to as“on line”or 
“adaptive” training. 


Neural networks have been trained to perform complex fanctions in various 
fields ofapplication including pattern recognition, identification,classification， 
Speech, vision and control systems. Alist of applicationgs is given in Chapter 1. 


Today neural networks can be trained to solve problems that are difficult for 
conventional computers or human beings. Throughout the toolbox emphasis is 
placed on neural network paradigms thatbuild up to or are themselves used in 
engineering, financial and other practical applications. 


Neural Networks 





The supervised training methods are commonly used, but other networks can 
be obtained 人 fom xzsxperuisea 杂 razimzzg techniques or 位 om direct aeszSs7 
methods. Unsupervised networks can be used, for instance, to identify groups 
of data. Certain kinds of linear networks and Hopfield networks are designed 
directly. In summary, there are a variety of kinds of design and learning 
techniques that enrich the choices that a user can make. 


The field ofneural networks has a history of some five decades but has found 
solid application only in the past fifteen years, and the field is still developing 
rapidqdly. Thus, it is qistinctly different 位 om the fieldqs of control systems or 
optimization where the terminology, basic mathematics, and design 
procedures have been firmly established and applied for many years. We do not 
View the Neural Network Toolbox as Simply a summary of established 
procedures that are known to work well. Rather, wehopethatit will be auseful 
tool for industry, education and research, a tool that will help users find what 
works and what doesn't, and a tool that will help develop and extend the field 
of neural networks. Because the field and the material are so new, this toolbox 
will explain the procedures, tell how to apply them, and illustrate their 
Successes and failures with examples. We believe that an understanding ofthe 
paradigms and their application is essential to the satisfactory and Successful 
use of this toolbox, and that without such understanding user complaints and 
inquiries would bury us. So please be patient 这 we include a lot of exzplanatory 
material. We hope that such material will be helpful to you， 
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Basic Chapters 


毒 忌 


The Neural Network Toolbox is written so that 计 you read Chapter 2, Chapter 
3 and Chapter 4youcan proceed to alater chapter, readit and use its functions 
without difficulty. To makethis possible, Chapter 2 presents the faondamentals 
of the neuron model, the architectures of neural networks. It also will discuss 
notation used in the architectures. All ofthis is basic material. It is to yourT 
advantage to understand this Chapter 2 material thoroughly. 


The neuron model and the architecture of a neural network describe how a 
network transforms its input into an output. This transformation can be 
Viewed as a computation. The model and the architecture each place 
limitations on what a particular neural network can compute. The way a 
network computes its output must be understood before training methods for 
the network can be explained. 


AMalhemaliical Nototion for Equations and Figures 





Mathematical Notaftion for Equations and Figures 


Basic Concepfs 
Scalars-small ;tawlzic letters..……Q,D,c 


Vectors - small bolq non-italic letters.….ab,c 


Matrices - capital BOLD non-italic letters.….A,B,C 


Language 
Vector means a column of numbers. 


Weight Matrices 


Scalar Element v， 
1 - row,J - column, 上 -time or iteration 


Matrix W() 

Column Vector wj 人 

Row Vector ;w(i) vector made of ith rovw of weight matrix W 
Bias Vector 

Scalar Element 2 


Vector b(b) 


Layer Notation 


Asingle superscript is used to identify elements oflayer. For instance, the net 
input oflayer 3 would be shown as n3. 


Superscripts &R,! are used to identify the source (d) connection and the 
destination (k) connection oflayer weight matrices angs input weight matrices. 
For 0 the layer weight matrix from layer 2 to layer 4 would be shown 
asLW …. 
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Input Weight Matrix TVW^ 


Layer Weight Matrix LW^ 


Figure and Equation Examples 


The following figure, taken from Chapter 12 illustrates notation used in such 
advanced figures. 


Inputs Layers 1 and 2 Layer 3 Outputs 


pl( 名 





5 3X(1*5) 3 


NE 一 一 一 一 


a2z(0 = logsig(GTVW21[pI(D;pICD]+IVW22pz(D) as(O=purelin(LW3,3as(C-1)+IVWV31 al (和 +bs+LWV32a2 dO) 
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Mathemaftics and Code Equivalents 


The transition from mathematics to code or vice versa can bemade with the aid 
of a few rules. They are listed here for future reference. 


To change 他 om mathematics notation to MATLAB notation, the user needs to: 
e Change Superscripts to cell array indices， 
1 

For example, P 一 D{1L} 
e Change subscripts to parentheses indices. 

For exzample, ps 一 D(2) ,and 3 一 D{1}(2) 
se Change parentheses indices to a second cell array index. 

For example， DTL( -1) 一 p{l,R-1 


e Change mathematics operators to MATLAB operators and toolbox functions. 
For exzample, ab 一 wii 


The following equations illustrate the notation used in 和 gures. 


7 = Wu1 1D1+W12pa+…+IW1RDR+D 


1.1 1.2 2 1 , 尺 


W = 2,1 2.2 人 2. 尺 


WS,1 0S,2 。…。 0S, 尽 


XXiii 


Preface 





Neural Network Design Book 


Professor Martin Hagan of Oklahoma State University, and Neural Network 
Toolbox authors Howard Demuth and Mark Beale have written a textbook， 
Mervral! VetuwuorR Desisn, published by the Brooks/Cole Publishing Company in 
1996 (ISBN 0-534-94332-2). The book presents the theory ofneural networks， 
discusses their design and application, and makes considerable use of MATLAB 
and the Neural Network Toolbox. Demonstration programs 位 om the book are 
used in various chapters of this Guide. (You can find all the book 
demonstration programs in the Neural Network Toolbox by typing nnd.) 


The book has: 


e An INSTRUCTOR'S MANUAL (ISBN 0-534-95049-3) for adopters and 


e TRANSPARENCY OVERHEADS for class use. The overheads come one to a 
page for instructor use and three to a page for student use, 


To place an order for the book, call 1-800-354-9706. 


To obtain a copy ofthe INSTRUCTORS MANUAL, contact the Brooks/Cole 
卫 ditorial O 值 ce, phone 1-800-354-0092. AsKk specifically for an instructors 
manual ifyou are instructing a class and want one. 


You can go directly to the Brooks/Cole Neural Network Design page at 


http://brookscole.com/engineering/nnd.html 
Once there, you can download the TRANSPARENCY MASTERS with a click 
on“Transparency Masters(3.6MB).” 
Alternatively, you might try the Brooks/Cole Thomson Leaning Web site home 
page: 

http://brookscole.com 


One there, pick Engineering,then 了 Electrical Engineering andfinally“Browse: 
This will get you a list of books, including Neural Networks Design. 
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Basic Chaptfers 


Chapter 2 contains basic material about network architectures and notation 
specific to this toolbox.Chapter 3 includes the first reference to basic fanctiongs 
Such as init and adapt. Chapter 4 describes the use of the functions designd 
and train, and discusses delays. Chapter 2, Chapter 3, and Chapter 4 should 
be read before going to later chapters 


Help and Installafion 


The Neural Network Toolbox is contained in a directory called nnet. Type help 
nnet for alisting ofhelp topics. 


Anumber of demonstrations are included in the toolbox. Each example states 
aproblem, showsgs the network used to solve the problem, and presents the final 
results. Lists ofthe neural network demonstration and application Scripts that 
are discussed in this guide can be found by typing help nndemos 


Instructions for installing the Neural Network Toolbox are found in one oftwo 
MATLAB dqocuments: the 77staliatzom Guziae fjor PC orthe 1712staliatzo7m Gxziae 1jor 
LNVTZX. 
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What's New in Version 4.0 


Atfew ofthenevw features andimprovements introduced with this version ofthe 
Neural Network Toolbox are discussed below. 


Control System Applications 


Anew Chapter 6 presents three practical control systems applications: 


se Network model predictive control 
e。 Model reference adaptive control 
e Feedback linearization controller 


Graphical User Interface 

Agraphical user interface has been added to the toolbox. This interface allows 
you to: 

e CTreate networks 

e Enter data into the GUI 

e。 Initialize, train, and Simnulate networks 

e 上 xport the training results fom the GUI to the command line workspace 

e Import data from the command line workspace to the GUI 


To open the Network/Data Manager window type nntoo1l. 


New Training Functions 

The toolbox now has four training algorithms that apply weight and bias 
learning rules. One algorithm applies the learning rules in batch mode. Three 
algorithms apply learning rules in three different incremental modes: 
etrainb - Batch training function 

e trainc - Cyclical order incremental training fanction 

etrainr - Random order incremental training fanction 

se trains - Sequential order incremental training function 


All four functions present the whole training set in each epoch (pass through 
the entire input set). 
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Note We no longer recommend using trainwb and trainwb1, which have 
been replaced by trainb and trainr. The fanction trainr differs 位 om 
trainwb1l in that trainwb1 only presented a single vector each epoch instead 
of going through all vectors, as is done by trainr. 





Thesenew training functions arerelatively fast because they generate M-code. 
The functions trainb, trainc, trainr, and trains all generate a temporary 
M-file consisting of specialized code for training the current network ip 
question ， 


Design of General Linear Networks 


The function newlind now allows the design oflinear networks with multiple 
inputs, outputs, and input delays. 


Improved Early Stopping 


卫 arly stopping can now be used in combination with Bayesian regularization , 
In some cases this can improve the generalization capability ofthe trained 
Detwork. 


Generalization and Speed Benchmarks 


Generalization benchmarks comparing the performance of Bayesian 
regularization and early stopping are provided. We also include speed 
benchmarks, which compare the speed of convergence of the various training 
algorithms on a variety of problems in pattern recognition and function 
approximation. These benchmarks can aid users in selecting the appropriate 
algorithm for their problem. 


Demonstration of a Sample Training Session 


Anew demonstration that jllustrates a sample training session is included in 
Chapter 5. A sample training session Script is also provided. Users can modify 
this Script to fit their problem. 
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Neural Network Applications 


Applications in tfhis Toolbox 


Chapter 6 describes three practical neural network control system 
applications, including neural network model predictive control, model 
reference adaptive control, and a feedback linearization controller. 


Other neural network applications are described in Chapter 11. 


Business Applicafions 


The 7988 DA4ARPA4 Nervral NetzworR Stuay [DARP88] lists various neural 
network applications, beginning in about 1984 with the adaptive channel 
equalizer. This device, which is an outstanding commercial success, is a single- 
neuron network used in long-dqistance telephone systems to stabilize voice 
Signals. The DA4RPA4 report goes on to list other commercial applications， 
including a small word recognizer, a process monitor, a Sonar classifier, and a 
risk analysis System 


Neural networks have been applied in many other fields since the DA4ARPA4 
report was written. A list of some applications mentioned in the literature 
follows. 


Aerospace 


e。 High performance aircraft autopilot, flight path Simulation, aircraft control 
systems, autopilot enhancements, aircraft component simulation, aircra 化 
component fault detection 


Automoftive 
e。 Automobile automatic guidance system, warTranty activity analysigs 


Banking 


se Check and other document reading, credit application evaluation 


Credit Card Activity Checking 


e Neural networks are used to spot unusual credit card activity that might 
possibly be associated with loss of a credit card 
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Defense 
e。 Weapon steering, target tracking, object discrimination, facial recognition， 
new kinds of sensors, Sonar, radar and image signal processing including 


data compression, feature extraction and noise Suppression, sijgnal/image 
identification 


Elecfronics 


e Code sequence prediction, integrated circuit chip layout, process control， 
chip failure analysis, machine vision, voice Synthesis, nonlinear modeling 


Enfertainment 
se Animation, Special effects, market forecasting 


Financial 


e Real estate appraisal, loan advisor, mortgage Screening, corporate bond 
rating, credit-line use analysis, portfolio trading program, corporate 
financial analysis, currency price prediction 


Industrial| 


e Neural networks are being trained to predict the output gasses of furnaces 
and other industrial processes. They then replace complex and costly 
equipment used for this purpose in the past. 


Insurance 
se Policy application evaluation, product optimization 


Manufacturing 


e。 Manufacturing process control, product design and analysis, process and 
machine diagnosis, real-time particle identification, visual quality 
inspection Systems, beer testing, welding quality analysis, paper quality 
prediction, computer-chip quality analysis, analysis of grinding operationsgs， 
chemical product design analysis, machine maintenance analysis, project 
bidding, planning and management, dynamic modeling of chemical processg 
System 
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NMedical 


e Breast cancer cell analysis, 卫 卫 G and 卫 CG analysis, prosthesis design， 
optimization oftransplant times, hospital exzpense reduction, hospital 
quality imnprovement, emergency-room test advisement 


Oil and Gas 


e 上 xploration 


Robofics 
e Trajectory control, forklift robot, manipulator controllers, vision Systems 


Speech 


e Speech recognition, Speech compression, vowel classification, text-to-sSpeech 
Synthesis 


Securifies 
e Market analysis, automatic bond rating, stock trading advisory Systems 


Telecommunicaftions 


e。 Image and data compression, automated information Services, real-time 
translation of spoken language, customer payment processing Systems 


Transportation 
e Truck brake diagnosis systems, vehicle scheduling, routing systems 


Summary 


The list ofadditional neural network applications, the money that has been 
invested in neural network software andhardware, andthe depth and breadth 
of interest in these devices have been growing rapidly. The authors hope that 
this toolbox will be useful for neural network educational and design purposes 
within a broad field of neural network applications. 
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Neuron Model 

Simple Neuron 

Transfer Functions . 
Neuron with Vector Input 


Network Architectures 
ALayer ofNeurons . 
Mujltiple Layers of Neuronsgs 


Data Structures 

Simulation With 人 Ti 全 in a ta Ge 
Simulation With Sequential Inputs in a Dynamic Network 
Simulation With Concurrent Inputs in a Dynamic Network 


Training Styles . 
Incremental Training (of Adaptive: 1 Other Ne) 
Batch Training 
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Neuron Model 


Simple Neuron 
Aneuron with a single scalar input and no bias appears on the left below. 





Input Neuron without bias Input ”Neuron with bias 

万 W 用 Q 2 

NA -= - 改 \_ILNWL 2 
4=j 帮 op) 4=Jjoop+D 


The scalar input p is transmitted through a connection that multiplies its 
strength by the scalar weight w, to form the product wp, again a scalar. Here 
the weighted input wp is the only argument of the transfer function 上 which 
produces the scalar output a. The neuron on the right has a scalar bias, .You 
may view the bias as simply being added to the product zwPp as shown by the 
summing junction or as shifting the function jto the left by an amount 5. The 
bias is much like a weight, except that it has a constant input of 1. 


The transfer fanction net input 7, again a scalar, is the sum of the weighted 
input wp and the bias 0. This sum is the argument of the transfer fanction 大 
(Chapter 7 discusses a different way to form the net input 7.) Here 太 is a 
transfer function, typically a step function or a sigmoid function,， which takes 
the argument 7 and produces the output aw. Examples of various trangsfer 
fanctions are given in the next section. Note that w and bp are both aa7xstaple 
scalar parameters of the neuron. The central idea ofneural networks is that 
such parameters can be adjusted so that the network exhibits some desired or 
interesting behavior. Thus, we can train the network to do a particular job by 
adjusting the weight or bias parameters, or perhaps the network itself will 
adjust these parameters to achieve some desired end. 


Neuron Model 





All ofthe neurons in this toolbox have provision for a bias, and a bias is used 
in many of our examples and will be assumed in most ofthis toolbox. However， 
you may omit a bias in a neuron 寺 you want. 


As previously noted, the bias b is an adjustable (scalar) parameter of the 
neuron. It is zol an input. However, the constant 7 that drives the bias is an 
input and must be treated as such when considering the linear dependence of 
input vectors in Chapter 4，“Linear Filters.” 


Transfer Funcfions 


Many transfer functiongs are included in this toolbox. A complete list of them 
can be found in“Transfer Function Graphs”in Chapter 14. Three of the most 
commonly used functions are shown below. 





QG = ja1alz2(A) 


Hard-Limit Transfer Function 


The hard-limit transfer function shown above limits the output of the neuron 
to either 0, 这 the net input argument m is less than 0; or 1, 让 P is greater than 
or equal to 0. We will use this function in Chapter 3“Perceptrons”to create 
neurons that make classification decisions. 


The toolboxz has a function, hardlim, to realize the mathematical hardq-limit 
transfer founction shown above. Try the code shown below. 


n = -5:0.1:5; 
plot(n,hardlim(n)，c+: ) 


It produces a plot ofthe function hardlim over the range -5 to +5. 


All ofthe mathematical transfer functions in the toolbox can be realized with 
a function having the same name. 


The linear trangsfer function is shown below. 
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QG = PU1eli(1) 


Linear Transfer Function 


Neurongs of this type are used as linear approximators in“Linear Filters”in 
Chapter 4. 


The sigmoid transfer function shown below takes the input, which may have 
any value between plus and minus infinity, and squashes the output into the 
range 0 to 二 . 





QG= /logsig(7) 
Log-Sigmoid Transfer Function 


This transfer function is commonly used in backpropagation networks, in part 
because it is differentiable. 


The symbol in the square to the right of each trangsfer function graph shown 
above represents the associated transfer function. These icongs will replace the 
general 太 in the boxes ofnetwork diagrams to show the particular transfer 
fonction being used. 


For a complete listing oftransfer functions and their icons, see the“Transfer 
Function Graphs”in Chapter 14. You can also specify your own transfer 
fonctions. You are not limited to the trangsfer functiongs listed in Chapter 14. 


Neuron Model 





You can experiment with a Simple neuron and various transfer functions by 
running the demonstration program nnd2n1. 


Neuron with Vector InpPuft 


Aneuron with a single R-element input vector is Shown below. Here the 
individual element inputs 


也 1， 1DD… 忆 尺 
are multiplied by weights 
1,1， 1,2， on。 1. 尺 


and the weighted values are fed to the summing junction. Their sum is Simply 
Wp, the dot product ofthe (single row) matrix W and the vector p. 


Input Neuron w Vector Input 


Where.. 


尺 =number of 
elements in 
input vector 





4a= 矿 Wp + 


The neuron has a bias 5, which is summed with the weighted inputs to form 
the net input 刀 . This sum, 7, is the argument of the transfer function 太 


7 = Wu1 1D1+W12po+…+IW1RDR+D 


This expression can, of course, be written in MATLAB code asgs: 
n = Wx*p + b 


However, the user will seldom be writing code at this low level, for such code is 
already built into functions to define and simulate entire networks. 
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The figure of a single neuron shown above contains a lot of detail. When we 
consider networks with many neurons and perhaps layers of many neurons， 
there is so much detail that the main thoughts tend to be lost. Thus, the 
authors have devised an abbreviated notation for an individual neuron. This 
notation, which will be used later in circuits ofmultiple neurons, is illustrated 
in the diagram shown below， 


Input Neuron 


Where.. 


尺 =number of 
elements in 
input vector 





4= 人 WPp +D) 


Here the input vector p is represented by the solid dark vertical bar at the le 化 . 
The dimensions ofp are shown below the symbol p in the figure as RRxl. (Note 
that we will use a capital letter, such as 民 in the previous sentence, when 
referring to the size of a vector.) Thus, p is a vector of 民 input elements. These 
inputs post multiply the single row, 民 column matrix W. As before, a constant 
1 enters the neuron as an input and is multiplied by a scalar bias pb. The net 
input to the transfer fanction 太 is ,the sum ofthe bias b and the product WPp. 
This sum is passedto thetransfer function fto gettheneuron's outputa, which 
in this case is a scalar. Note that 让 we had more than one neuron, the network 
output would be a vector. 


Aiayer of anetwork is defined in the figure shown above.Alayer includes the 
combination of the weights, the multiplication and summing operation (here 
realized as a vector product Wp)j, the bias 5, and the transfer function 上 The 
array of inputs, vector p, is not included in or called a layer. 


Each time this abbreviated network notation is used, the Size of the matrices 
will be shown just below their matrix variable names. We hope that thigs 
notation will allow you to understand the architectures and follow the matrix 
mathematics associated with them. 


Neuron Model 





As discussed previously, when a specific transfer function is to be used in a 
figure, the symbol for that transfer fanction will replace the Ashown above. 
Here are some examples. 


站 


Pa7QL211 PDIFelLI7 1ogsig 


You can experiment with a two-element neuron by running the demonstration 
program nnd2n2. 
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Neftwork Archifectures 


Two or more of the neurons shown earlier can be combined in a layer, and a 
particular network could contain one or more Such layers. First consider a 
single layer of neurons， 


A Layer of Neurons 


A one-layer network with 刃 inpput elements and S neurons follows. 


Input Layer of Neurons 


Where.… 


尺 =number of 
elements in 
input vector 


S=numberof 
neurons in layer 





a=f(Wp+b) 


In this network, each element ofthe input vector p is connected to each neuron 
input through the weight matrix W.The zthneuronhas asummerthat gathers 
its weighted inputs and bias to form its own scalar output 7.( 功 . The various 即 ( 人 
taken together form an S-element net input vector n. Finally, the neuron layer 
outputs form a column vector a. We shovw the expression for a at the bottom of 
the fgure. 


Note that it is common for the number ofinputs to a layer to be different from 
the number ofneurons (ie., 民 19).Alayer is not constrained to have the 
number of its inputs equal to the number of its neurons. 


Network Architectures 





You can create a single (composite) layer of neurons having different transfer 
functions simply by putting two ofthe networks shown earlier in parallel. Both 
networks would have the same inputs, and each network would create some of 
the outputs, 


The input vector elements enter the network through the weight matrix 双 . 


1.1 1.2 8 1 , 尺 


W -= |02.1 2.2 … Ia.B 


S,1 0S,2 。。 0S, 尺 


Note that therow indices on the elements ofmatrix W indicate the destination 
neuron ofthe weight, andthe column indices indicate which source is the input 
for that weight. Thus, the indices in w1 Say that the strength of the signal 
廊 om the second input element to the first (and only) neuron isW1 9 . 


The SS neuron 民 input one-layer network also can be drawn in abbreviated 
Dotation . 


Input Layer of Neurons 


和 ， Where... 


尺 =number of 
elements in 
input vector 


SS=number of 
neurons in layer 1 





a=f(Wp+b) 


Here p is an 刃 length input vector, W is an 9SxR matrix, and a and b are 9 
length vectors. As defined previously, the neuron layer includes the weight 
matrix, the multiplication operations, the bias vector b, the summer, and the 
trangsfer fonction boxes， 
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Inputs and Layers 


We are about to discuss networks having multiple layers so we will need to 
extend our notation to talk about such networks. Specifically, we need to make 
a distinction between weight matrices that are connected to inputs and weight 
matrices that are connected between layers. We alsoneedto identify the source 
and destination for the weight matrices. 


We will call weight matrices connected to inputs,7z2pzt esAtsy and we will call 
weight matrices coming 位 om layer outputs, /ayer ezAts. Further, we will use 
Superscripts to identify the source (second index) and the destination (first 
index) for the various weights and other elements ofthe network.Toillustrate， 
we have taken the one-layer multiple input network shown earlier and 
redrawn it in abbreviated form below. 


Input Layer 1 


Where.…. 


尺 =number of 
elements in 
input vector 


S=number of 
SIx1 neurons in Layer 1 





al = 全 (GTWip +by) 


Asyou can see,wehave labeled the weightmatrix connected to the input vector 
p as an Input Weight matrix (TIW11) having a source 1 (second index) and a 
destination 1 (first index). Also, elements of layer one, such as its bias, net 
input, and output have a superscript 1 to say that they are associated with the 
和 rst ]ayer. 


In the next section, we will use Layer Weight (LW) matrices as well as Input 
Weight (TIW) matrices. 


You might recall 位 om the notation section ofthe Preface that conversion ofthe 
layer weight matrix 位 om math to code for a particular network called met is: 


IWL1T_>netIW{1L 1} 


Thus, we could write the code to obtain thenet input tothe transfer fanction as: 
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n{1} = net.IW{1,1}+*p + net.b{f1l}; 


Multiple Layers of Neurons 


A network can have several layers. Each layer has a weight matrix W, a bias 
vector b, and an output vector a. To distinguish between the weight matrices， 
output vectors, etc., for each of these layers in our figures, we append the 
number ofthe layer as a superscript to the variable ofinterest. You can see the 
use of this layer notation in the three-layer network shown below, and in the 
equations at the bottom of the figure， 





al=fGIVWp+by) a2 = 人 2(LW2al+b?) a3 = 人 (LWa2az+ba3) 


as = 人 (LW32 凶 (LVW2fIGVWDp +bD)+ba+b3) 


The network shown above has 尺 I inputs, S 1neurons in the first layer， S2 
neurons in the second layer, etc. It is common for different layers to have 
different numbers ofneurons. A constant input 1 is fed to the biases for each 
neuron. 


Note thatthe outputs ofeach intermediate layer are the inputs to the following 
layer. Thus layer 2 can be analyzed as a one-layer network with Slinputs, S“ 
neurons, and an S2xS1lweight matrix W2. The input tolayer2is al; theoutput 
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is a2. Now that we have identified all the vectors and matrices oflayer 2, we 
can treat it as a single-layer network on its own. This approach can be taken 
with any layer of the network. 


The layers of a multilayer network play different roles. A layer that produces 
the network output is called an ovt 如 xit /ayer. All other layers are called jzqaae7m 
/ayers. The three-layer network shown earlier has one output layer (layer 3) 
and two hidden layers (layer 1 and layer 2). Some authors refer to the inputs 
as a fourth layer. We will not use that designation. 


The same three-layer network discussed previously also can be drawn using 
our abbreviated notation. 





91 S2X1 .92 S3Xx1 93 
人 


al = 人 GTVDp +b)) a2 = 人 (LVW2: al +b?) a3 = 人 (LVWa2az +ba3) 


a3 = 各 (LWa2> 人 (LW2f1(GVVp +bD)+b2rbs =y 


Mujltiple-layer networks are quite powerful. For instance, a network of two 
layers, where the first layer is sigmoid and the second layer is linear, can be 
trained to approximate any function (with a finite number of discontinuities) 
arbitrarily well. This kind oftwo-layer network is used extensively in Chapter 
5,“Backpropagation.” 


Here we assume that the output ofthe third layer, aa, is the network output of 
interest, and we have labeled this output as y. We will use this notation to 
specify the output of multilayer networks. 
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Dafta Sfrucfures 


This section discusses hovw the format ofinput data structures affects the 
Simulation of networks. We will begin with static networks, and then move to 
dynamic networks. 


We are concerned with two basic types of input vectors: those that occur 
coPczrreatly (at the same time, or in no particular time sequence), and those 
that occur sedxetialiy in time. For concurrent vectors, the order is not 
important, and ifwe had anumber of networks running in parallel, we could 
present one input vector to each of the networks. For sequential vectors, the 
order in which the vectors appear is important. 


Simulation With Concurrent Inpufts in a Sfafic 
Nefwork 


The simplest situation for simnulating a network occurs when the network to be 
Simulated is static (has no feedback or delays). In this case, we do not have to 
be concerned about whether or not the input vectors occur in a particular time 
sequence, So we can treat the inputs as concurrent. In addition, we make the 
problem even Simpler by assuming that the network has only one input vector， 
Use the following network as an example. 


Inputs Linear Neuron 





4=DUreli(YWp+D) 


To set up this feedforward network, we can use the following command. 
net = newlin([1 3;1 3] ,1) 


For simplicity assign the weight matrix and bias to be 
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W = 13| and D = |o| 
The commands for these assignments are 


net.IW{1,1} = [1 2]; 
net.b{1} = 0 


Suppose that the network simulation data set consists of Q = 4 concurrent 
Vectors: 


| 


Concurrent Vectors are presented to the network as a Single matrix: 
P=[1223;2131]; 


We can now simulate the network: 


A 
二 -二 


Sim(net,P) 


5 4 8 5 


Agsingle matrix of concurrent Vectors is presented to the network and the 


Detwork produces a Single matrix of concurrent vectors as output. The result 

would be the same 计 there were four networks operating in parallel and each 
network received one ofthe input vectors and produced one ofthe outputs. The 
ordering ofthe input vectors is not imnportant as they do not interact with each 


other. 


Simulation With Sequential Inputs in a Dynamic 
Nefwork 


When a network contains delays, the input to the network would normally be 
asequence ofinput vectors that occur in a certain time order. To illustrate thigs 


case, we use a simple network that contains one delay. 


Data Siructures 





Inputs Linear Neuron 


P(ObD 





a0D) = w ，POD+wW DGU-D 


The following commands create this network: 


net = newlin([-1 1],1,[0 1]); 
net.biasCconnect = 0 


Assign the weight matrix to be 
W = [13] 
The command is 

net.IW{1,1} = [1 2]; 


Suppose that the input sequence is 
pl=|1,p2=|[2,p3=|3，p4= [4 


Sequential inputs are presented to the network as elements of a cell array: 
P={1234} 
We can now simulate the network: 


A 
一 
[1] [4] [7] [10] 


Slim(net,P) 


We input a cell array containing a sequence of inputs, and the network 
produced a cell array containing a sequence of outputs. Note that the order of 
the inputs is imnportant when they are presented as a sequence. In this case， 
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the current output is obtained by multiplying the current input by 1 and the 
preceding input by2 and summing theresult. Ifwe were to change the order of 
the inputs, it would change the numbers we would obtain in the output. 


Simulation With Concurrent Inpufts in a Dynamic 
Neftwork 


I we were to apply the same inputs from the previous example as a set of 
concurrent inputs instead of a sequence of inputs, we would obtain a 
completely different response. (Although, it is not clear why we would want to 
do this with a dynamic network.) It would be as ifeach input were applied 
concurrently to a separate parallel network. For the previous example, 二 we 
use a concurrent set of inputs we have 


pl=|1，pz=|2|，ps=|3j，p4 = 4 


which can be created with the following code: 
P=[1234]; 
When we simulate with concurrent inputs we obtain 


A=Sim(net,P) 
我 - 世 
1 2 3 4 


The result is the same as 让 we had concurrently applied each one ofthe inputs 
to a separate network and computed one output. Note that since we did not 
assign any initial condqitions to the network delays, they were assumed to be 
zero. For this case the output will simply be 1 times the input, since the weight 
that multiplies the current input is 十. 


In certain special cases, we might want to simnulate the network response to 
SeVeral different sequences at the same time. In this case, we would want to 
present the network with a concurrent set of sequences. For example, let's say 
we wanted to present the following two sequences to the network: 


pl(1) = | 引 , pl1(2) 


poz(1) = 4， p2(2) 


[2?j, pl(3) = [3 引 ,， pl(4) = [4 
[3j, pz(3) = [2 引 ，pzG = 上 


Data Siructures 





The input P should be a cell array, where each element of the array containg 
the two elements ofthe two sequences that occur at the same time: 


P={141 [23] [32] [4 1]} 
We can now Simulate the networKk: 
A=Simn(net,P); 
The resulting network output would be 


A={f[14] [414] [78] [105]} 


As you can see, the first column of each matrix makes up the output sequence 
produced by the first input sequence, which was the one we used in an earlier 
example. The second column of each matrix makes up the output sequence 
produced by the second input sequence. There is no interaction between the 
two concurrent sequences. It is as 诺 they were each applied to separate 
networks running in parallel. 


The following diagram shows the general format for the input P to the sim 
fanction when we have Q concurrent sequences of TS time steps. It covers al 
cases where there is a single input vector. 卫 ach element ofthe cell array is a 
matrix ofconcurrent vectors that correspond to the same point in time for each 
Sequence. Ithere are multiple input vectors, there will be multiple rows of 
matrices in the cell array. 


Qth Sequence 


{[p1(1),pa(1), … po(D)], [p1(2), pz(2), …Ppo(2)] … [p1(TS),pa(TS), ,po(TS)]} 


人 


First Sequence 


In this section, we have applied sequential and concurrent inputs to dynamic 
networks. In the previous section, we applied concurrent inputs to static 
networks. It is also possible to apply sequential inputs to static networks. Lt 
will not chnange the simnulated response ofthe network, butitcan affect the way 
in which the network is trained. This will become clear in the next section. 
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Training Styles 


In this section, we describe two different styles of training. In zcre7ze7ta/ 
training the weights and biases ofthe network are updated each time an input 
is presented to the network. In patcA training the weights and biasegs are only 
updated after all ofthe inputs are presented. 


Incremental Training (of Adapfive and Other 
Netfweorks) 


Incremental training can be applied to both static and dynamic networks， 
although it is more commonly used with dynamic networks, such as adaptive 
filters. In this section, we demonstrate how incremental training is performed 
on both static and dynamic networks. 


Incremental Training with Static Networks 


Congsider again the static network we used for our first example. We want to 
train it incrementally, so thatthe weights and biases will be updated after each 
input is presented. In this case we use the function adapt, and we present the 
inputs and targets as sequences. 


Suppose we want to train the network to create the linear function 


= 2D1+Po . 


Then for the previous inputs we used， 


，- 国 。- 加 加。 


the targets would be 


日 =[ 刘 tb=l5t=l7 ta=l 


We first set up the network with zero initial weights and biases. We also set 
the learning rate to zero initially, to shovw the effect ofthe incremental training. 


net = newlin([-1 1;-1 1],1,0,0) | 
net.IW{1,1} = [0 0]; 
net.b{1} = 0 


Training Styles 





For incremental training we want to present the inputs and targets as 
Sequences: 


P={I1;2] [2;1] [2;3] [3;1]}; 
T={4577i 


Recall from theearlier discussion thatfor astaticnetwork the simulation ofthe 
network produces the same outputs whether the inputs are presented as a 
matrix ofconcurrent vectors or as a cell array of sequential vectors. This is not 
true when training the network, however. When using the adapt function, 让 
the inputs are presented as a cell array of sequential vectors, then the weights 
are updated as each input is presented (incremental mode). As we see in the 
next section,ifthe inputs are presented as amatrix ofconcurrent vectors, then 
the weights are updated only after all inputs are presented (batch mode). 


We are now ready to train the network incrementally. 
[net,ayepf] = adapt(net,P,T) 


The network outputs will remain zero, since the learning rate is zero, and the 
weights are not updated. The errors will be equal to the targets: 


a= [0] [0] [01] [01] 
e = [4] [5] [7] [7] 


H 开 we now set the learning rate to 0.1 we can see how the network is adjusted 
as each input is presented: 


net.inputWeights{1,1}. LIearnParam.1Lr=0.1; 
net.biases{1,1}.1LIearnParam.1Lr=0.1; 
[net,ae,pf] = adapt(net,P,T) ; 

a= [0|] [2] [6.0] [5.8] 

e= [4] [3] [1.0] [1.2] 


The first output is the same as it was with zero learning rate, since no update 
is made until the first input is presented. The second output is different, since 
the weights have been updated. The weights continue to be modified as each 
erTror is computed. Ifthe network is capable and the learning rate is set 
correctly, the erTror will eventually be driven to zero. 


Incremental Training with Dynamic Networks 


We can also train dynamic networks incrementally. In fact, this would be the 
most common Situation. Let's take the linear network with one delay at the 
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input that weused in a previous example. We initialize the weights to zero and 
set the learning rate to 0.1. 


net = newlin([-1 1],1,[0 1],0.1); 
net.IW{1,1} = [0 0]; 
net.biasCconnect = 0 


To train this network incrementally we present the inputs and targets as 
elements of cell arrays. 


Pi = {1]; 
P = {2 3 4]}; 
T=1{357}; 


Here we attempt to train the network to sum the current and previous inputs 
to create the current output. This is the same input sequence we used in the 
previous example of using sim, except that we assign the first term in the 
sequence as the initial condition for the delay. We now can sequentially train 
the network using adapt. 


[net,aj epf]l] = adapt(net,P,T,Pi) 
a= [0] [2.4] [ 7.98] 
e [3] [2.6] [-0.98] 


The first output is zero, since the weights have not yet been updated. The 
weights change at each subsequent time step. 


Baftch Training 

Batch training, in which weights and biases are only updated after all of the 
inputs and targets are presented, can be applied to both static and dynamic 
networks. We discuss both types ofnetworks in this section . 


Batch Training with Static Networks 

Batch training can be done using either adapt or train, although train is 
generally the best option, since ittypically has access to more efficient training 
algorithms. Incremental training can only be done with adapt; train can only 
perform batch training. 


Let's begin with the staticnetwork weused in previous examples. The learning 
rate will be set to 0.1. 


net = newlin([-1 1;-1 1],1,0,0.1); 


Training Styles 





net.IW{1,1} = [0 0]; 
net.b{1} = 0 


For batch training of a static network with adapt, the input vectors must be 
placed in one matrix of concurrent vectors. 


P= [1223j2131]) 
T= [45727] 


When we call adapt, it will invoke trains (which is the default adaptation 
fanction for the linear network) and learnwh (which is the default learning 
fanction for the weights and biases). Therefore, Widrow-Hoff learning is used. 


[net,ae,pf] = adapt(net,P,T) ; 
a=0000 
e=4577 


Note that the outputs of the network are all zero, because the weights are not 
updated until all ofthe training set has been presented. Ifwe display the 
weights we find: 


>net.IW{1,11} 
ans = 4.9000 4.1000 
>net.b{1} 
ans = 
2.3000 


This is different that the result wehad after one pass of adapt with 
incremental updating. 


Novw lets perform the same batch training using train. Since the Widrow-Ho 人 
rule can be used in incremental or batch mode, it can be invoked by adapt or 

train. There are several algorithms that can only be used in batch mode (e.g.， 
Levenberg-Marquardt), and so these algorithms can only be invoked by train， 


The network will be set up in the same way. 


net = newlin([-1 1;-1 1],1,0,0.1); 
net.IW{1,1} = [0 0]; 
net.b{1} = 0 


For this case, the input vectors can either be placed in a matrix of concurrent 
vectors or in a cell array of sequential vectors. Within train any cell array of 
sequential vectors is converted to a matrix of concurrent vectors. This is 
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because the network is static, and because train always operates in the batch 
mode. Concurrent mode operation is generally used whenever possible， 
because it has a more effEicient MATLAB implementation. 


P= [1223;j2131]) 
T= [4577]; 


Now wearereadyto train thenetwork. We will train it for only one epoch, since 
we used only one pass of adapt. The default training fanction for the linear 
network is trainc, andthe default learning function for the weights and biases 
is learnwh, so we should get the same results that we obtained using adapt in 
the previous example, where the default adaptation function was trains, 


net.inputWeights{1,1}.1LearnParam.lIFr = 0.1; 
net.biases{1}.lLearnParam.lr = 0.1; 
net.trainParam.epochs = 1; 

net = train(net,P,T) 


I we display the weights after one epoch of training we find: 


>net.IW{1,11} 
ans = 4.9000 4.1000 
>net.b{f1} 
局 S 三 
2.3000 


This is the same result we had with the batch mode training in adapt. With 
static networks, the adapt function can implement incremental or batch 
training depending on the format ofthe input data. Ifthe data is presented as 
amatrix of concurrent vectors, batch training will occur. Ifthe data is 
presented as a sequence, incremental training will occur. This is not true for 
train, which always performs batch training, regardless of the format of the 
input. 


Batch Training With Dynamic Networks 


Training static networks is relatively straightforward. Iwe use train the 
network is trained in the batch mode andthe inputs is converted to concurrent 
vectors (columns of a matrix), even 让 they are originally passed as a sequence 
(elements of a cell array). If we use adapt, the format of the input determines 
the method of training. Ifthe inputs are passed as a sequence, then the 
network is trained in incremental mode. Ifthe inputs are passed as concurTrrent 
vectors, then batch mode training is used. 


Training Styles 





With dynamic networks, batch mode training is typically done with train only， 
especially 计 only one training Sequence exists. To jllustrate this, let's consider 
again the linear network with a delay. We use a learning rate of 0.02 for the 
training.(When using a gradient descent algorithm, we typically use a smaller 
learning rate for batch mode training than incremental training, because all of 
the indqividual gradients are summed together before determining the step 
change to the weights.) 


net = newlin([-1 1],1,[0 1],0.02); 
net.IW{1,1}=[0 0]; 
net.biasConnect=0 
net.trainParam.epochs = 1; 


PI = {1}; 
P={23 4 
T=({356}; 


We want to train the network with the same sequence we used for the 
incremental training earlier, but this time we wantto update the weights only 
after all ofthe inputs are applied (batch mode). The network is simulated in 
sequential mode because the input is a sequence, but the weights are updated 
in batch mode. 


net=train(net,P,T,PI) 


The weights after one epoch of training are 


>net.IW{1,1} 
ans = 0.9000 0.6200 


These are different weights than we would obtain using incremental training， 
where the weights would be updated three times during one pass through the 
training set. For batch training the weights are only updated once in each 
epoch. 
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Summary 


The inputs to a neuron include its bias and the sum of its weighted inputs 
(using the inner product). The output of a neuron depends on the neuron's 
inputs and on its transfer function. There are many useful trangsfer foanctions， 


A single neuron cannot do very much. However, Several neurons can be 
combined into a layer or multiple layers that have great power. Hopefully this 
toolbox makes it easy to create and understand Such large networks. 


The architecture of a network consists of a description ofhow many layers a 
network has, the number of neurons in each layer, each layers transfer 
fonction, and hovw the layers connect to each other. The best architecture to use 
depends on the type of problem to be represented by the networkK. 


Anetwork effects acomputation bymappinginput values to outputvalues. The 
particular mapping problem to be performed fixes thenumber ofinputs, as well 
as the number of outputs for the network. 


Aside from the number ofneurons in a network's output layer, the number of 
neurons in each layer is up to the designer. Except for purely linear networks， 
the more neurons in a hidden layer, the more powerful the network. 


Ia linear mapping needs to be represented linear neurons should be used. 
However, linear networks cannot perform anymnonlinear computation. Use of a 
nonlinear trangsfer function makes a network capable of storing nonlinear 
relationships between input and outpnut， 


Avery Simple problem can be represented by a single layer of neurons. 

再 owever, single-layer networks cannot solve certain problems. Multiple 
feed-forward layers give a network greater 位 eedom. For example, any 
reasonable function can be represented with a two-layer network: a sigmoid 
layer feeding a linear output layer. 


Networks with biases can represent relationships between inputs and outputs 
Imore easjly than networks without biases. (了 or example, a neuron without a 
bias will always have anet input to the trangsfer fanction of zero when all of its 
inputs are zero. However, a neuron with a bias can learn to have any net 
transfer function input under the same conditions by learning an appropriate 
value for the bias.) 


Feed-forward networks cannot perform temporal computation. More complex 
networks with internal feedback paths are required for temporal behavior. 


Summary 





Ifseveral input vectors are to be presented to anetwork, they may be presented 
sequentially or concurrently. Batching of concurrent inputs is computationally 
more efficient and may be what is desired in any case. The matrix notation 
used in MATLAB makes batching simple. 


Figures and Equations 


Simple Neuron 


Input Neuron without bias Input ”Neuron with bias 





NA 一 一/ 
4=J 帮 PP) 4=jP+D) 


Hard Limit Transfer Funcfion 





QG = Pa7aLD(1) 


Hard-Limit Transfer Function 
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Purelin Transfer Function 





QG = PUTFeL(1) 


Linear Transfer Function 


Log Sigmoid Transfer Function 





QG=1ogsig(7) 
Log-Sigmoid Transfer Function 


Summary 





Neuron With Vector Input 


Input Neuron w Vector Input 


Where.. 


尺 =number of 
elements in 
input vector 





Net Input 


7 = Wu1 1D1+W12po+…+IW1RDR+D 


Single Neuron Using Abbreviated Notation 


Input Neuron 


Where.. 


RR=number of 
elements in 
input vector 
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lcons for Transfer Funcfions 


PC7QLD1 PH1elLi7 


1ogsig 


Layer of Neurons 


Input 


Layer of Neurons 


Where.. 


尺 =number of 
elements in 
input vector 


S=number of 
neurons in layer 1 


a=f(Wp+b) 


Summary 





A Layer of Neurons 


Input Layer of Neurons 


Where.. 


尺 =number of 
elements in 
input vector 


SS=number of 
neurons in layer 





a=f(Wp+b) 
Weight Matrix 
11 12 … IO1R 
W -|ua1ma22 … 2.R 
S,1 AS,2 0S, 尺 
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Layer of Neurons, Abbreviated Notation 


Input Layer of Neurons 





a=f(Wp+b) 
Layer of Neurons Showing Indices 


Input Layer 1 





al = GW2p +bD) 


Three Layers of Neurons 


Where.… 


有 RR=number of 
elements in 
input vector 


S=number of 
neurons in layer 1 


Where.. 


尺 =number of 
elements in 
input vector 


SS=number of 
neurons in Layer 1 


Summary 








al =fdIVWp+by) a2 = 亿 (LW21al+b?) a3 = 人 (LWa2zaz+ba3) 


a3 = 人 (LVW3a2 允 (LVW2fIGVWp +bD)+bz)+ba) 


Three Layers, Abbreviated Notafion 





al = 在 (Wip+bly) a2 = 人 (LVW2: al+b)?) a3 = 人 各 (LWa2a2 +b3) 


a3 = 人 (LWa2 人 (LW2f1GVWp +bD+bz)+rbs =y 


2-31 








2 Neuron Model and Network Architectures 





2-32 


Linear Neuron With Two-Element Vector Inpuf 


Inputs Linear Neuron 





4=DUrelin(YWp+D) 
Dynamic Network With One Delay 


Inputs Linear Neuron 


六 





aD = w DOD+wW PC-D 
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Inftroduction 


This chapter has a number of objectives. First we want to introduce you to 
learning rules, methods of deriving the next changes that might be made in a 
network, and training, a procedure whereby a network is actually adqjusted to 
do a particular job. Along the way we discusgs a toolbox function to create a 
Simple perceptron network, and we also cover functions to initialize and 
simulate Such networks. We use the perceptron as a vehicle for tying these 
concepts together. 


Rosenblatt [Rose61] created many variations of the perceptron. One of the 
Simplest was a single-layer network whose weights and biases could be trained 
to produce a correct target vector when presented with the corresponding input 
vector. The training technique used is called the perceptron learning rule. The 
perceptron generated great interest due to its ability to generalize 位 om its 
training vectors and learn 位 om initially randomly distributed connections. 
Perceptrons are especially suited for simple problems in pattern classification. 
They are fast and reliable networks for the problems they can solve. In 
addition, an understanding ofthe operations ofthe perceptron provides a good 
basis for understanding more complex networks. 


In this chapter we define what we mean by a learning rule, explain the 
perceptron network and its learning rule, and tell you how to initialize and 
simulate perceptron networks. 


The discussion of perceptron in this chapter is necessarily brief For a more 
thorough qiscussion, see Chapter 4 “Perceptron Learning Rule” of [IHDB1996],， 
which discusses the use ofmultiple layers ofperceptrons to solve more difEGcult 
problems beyond the capability of one layer. 


You also may want to refer to the original book on the perceptron, Rosenblatt， 
了 .,， Przpciples of Nevroadymna7azcs Washington D.C.: Spartan Press, 19611. 
[Rose61|]. 


Introduction 





ImpPortant Percepfron Funcfions 


Entering help percept at the MATLAB command line displays all the 
fanctions that are related to perceptrons. 


Perceptron networks can be created with the function newp. These networks 
can be initialized, simulated and trained with the init, sim and train. The 
following material describes how perceptrons work and introduces these 
fanctions， 
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A perceptron neuron, which uses the hard-limit transfer fanction hardlLim, is 
shown below. 


Input ”Perceptron Neuron 


Where.… 


RR=number of 
elements in 
input vector 





4=hardlimn(Wp+D) 


卫 ach external input is weighted with an appropriate weight w1j, and the sum 
of the weighted inputs is sent to the hard-limit transfer function, which also 
has an input of 1 transmitted to it through the bias. The hardq-limit transfer 
fanction， which returns a 0 or a 1, is shovwn below. 


CI 





QG = 1aC71CLzTU(7D) 


Hard-Limit Transfer Function 


The perceptron neuron produces a lifthe net input into the transfer fanction 
is equal to or greater than 0; otherwise 让 produces a 0. 


The hard-limit transfer function gives a perceptron the ability to classify input 
vectors by dividing the input space into two regions. Specifically, outputs will 
be 0 这 the net input m is less than 0, or 工 寺 the net input7m is 0or greater. The 
input space of atwo-input hard limit neuron with the weights 

WwW11=-1 io2= 工 and abias 0 = 1,is shown below. 


Neuron Model 








Two classification regiongs are formed by the aecisiom poxPaary lineLL at 
Wp+pb = 0.This lineis perpendicular to the weight matrix W and shifted 
according to the bias 0. Input vectors above and to the left ofthe lineL will 
result in anet input greater than 0; and therefore, cause the hard-limit neuron 
to output a 1. Input vectors below and totheright ofthelineL cause the neuron 
to output 0. The dividing line can be oriented and moved anywhere to classify 
the input space as desired by picking the weight and bias values. 


Hard-limit neurons without a bias will always have a classification line going 
through the origin. Addqing a bias allows the neuron to solve problems where 
the two sets ofinput vectors arenot located on different sides of the origin. The 
bias allows the decision boundary to be shifted away 位 om the origin as shown 
in the plot above. 


Youmay wantto run the demonstration program nnd4db. With it you can move 
a _ decision boundary around, pick new inputs to classify, and see hovw the 
repeated application ofthe learning rule yieldqs anetwork that does classify the 
input vectors properly. 


| Percepitrons 





Percepfron Architfecture 


The perceptron network consists of a single layer of S perceptron neurons 
connected to 尺 inputs through a set of weights wij as Shown belovw in two 
forms. As before, the network indices :and indicate that wij is the Strength 


of the connection 位 om thejth input to the ith neuron， 


Input 1 Perceptron Layer Input 1 Layer 1 





玉 SIX1 S1 
al=hardlim(wWipi+by) 


Where.. 


及 =number of elements in Input 





。 S1 - ; 
ai-= hardlimrrwipi+by number of neurons in layer 1 


The perceptron learning rulethat we will describe shortly is capable oftraining 
only a single layer. Thus, here we will consider only one-layer networks. This 
restriction places limitations on the computation a perceptron can perform， 
The types of problems that perceptrons are capable of solving are discussed 
later in this chapter in the“Limitations and Cautions”section. 
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Creafting a Perceptfron (newp) 


A perceptron can be created with the function newp 


net = newp(PR，S) 
where input arguments: 


PR is an 有 人 -by-2 matrix ofminimum and maximum values for input 
elements， 


S is the number ofneurons. 


Commonly the hardlim function is used in perceptrons, so it is the default. 


The code below creates a peceptron network with a single one-element input 
Vector and one neuron. The range for the single element of the single input 
Vector is [0 2]. 


net = newp([0 2],1); 

We can see what network has been created by executing the following code 
Inputweights = net.inputweights{1,1} 

which yields: 


inputweights = 
delays: 0 
initFcn: "initzero 
learn: 1 
JearnFcn: `" Learnp- 
LearnParam: 上 [] 
Size: [1 1] 
USerdata: [1x1 struct] 
WeightFcn: “dotprod 


Note that the default learning function is learnp, which is qiscussed later in 
this chapter. The net inputtothe hardlim transfer function is dotprod, which 
generates the product ofthe input vector and weight matrix and adds the bias 
to compute the net input. 


Also note that the defaul initialization function,， initzero, is used to set the 
initial values ofthe weights to zero. 


Similarly， 
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biases = net.biases{1} 
givesg 
biases = 
InitFcn: "Initzero， 
Learn: 1 
JearnFcn: `" Learnp， 
LearnParam: 上 [] 
Sl1ze: 1 
Userdata: [1x1 struct] 


We can see that the default initialization for the bias is also 0. 


Simulaftion (sim) 
To show how sim works we examine a Simple problem. 


Suppose we take a perceptron with a single two-element input vector, 1ike that 
discussed in the aecisiom poxPaary figure. We define the network with 


net = newp([-2 2;-2 +2]1,1); 


As noted above, this gives us zero weights and biases, soiwewantaparticular 
set other than zeros, we have to create them. We can set the two weights and 
the one bias to -1 1 and 1 as they were in the decision boundary figure with the 
following two lines of code. 


net.IW{1,1}= [-1 1]; 
net.b{1} = [1]; 
To make sure that these parameters were Set correctly, we check them with 


net.IW{1,1} 
和 用 S 三 

-1 1 
net.b{f1} 
ans = 三 


1 


Now let us see 让 the network responds to two signals, one on each side of the 
perceptron boundary. 


p1 = [全 1) 


Creaiing a Perceptron {newpl 





al = Sim(net,p1) 


al = 
1 
and for 
p2 = [13;-1] 
a2 = Sim(net,p2) 
a2 = 


0 


Sure enough, the perceptron classified the two inputs correctly. 


Note that we could present the two inputs in a sequence and get the outputs in 
a Sequence as wel!. 


p3={ 人 1 [1 1] 
a3g = Sim(net,p3) 
a3 


[1 1] [01] 


You may want to read more about sim in “Advanced Topics”in Chapter 12. 


Inifializafion (inif) 
You can use the function init to reset the network weights and biases to their 
original values. Suppose, for instance that you start with the network 


net = newp([-2 2;-2 +2],1); 
Novw check its weights with 
wtSs = net.IW{1,1} 
which gives, as expected， 
WwWtS = 
0 0 


In the same way, you can verify that the bias is 0 with 


bias = net.b{f1} 
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Which gives 
bias = 
0 


Novw set the weights to the values 3 and 4 and the bias to the value 5 with 


net.IW{1,1} = [3,4] 
net.b{1} = 5; 


Recheck the weights and bias as shown above to verify that the change has 
been made. Sure enough， 


WtS = 


bias = 
5 
Novw use init to reset the weights and bias to their original values. 
net = init(net) ; 
We can check as shown above to verify that. 


WtS = 


bias = 
0 


We can change the way that a perceptron is initialized with init.For instance， 
we can redefine the network input weights and bias initFcns as rands，and 
then apply init as shown below, 


net.iInputweights{f1,1}.initFcn = rands '; 
net.biases{1}+.initFcn = :rands ' ; 
net = Init(net) ; 


Novw check on the weights and bias. 


WtS = 
0.2309 0.5839 
biases = 


Creaiing a Perceptron {newpl 





-0.1106 


We can see that the weights and bias have been given random numbers， 


You may want to read more about init in“Advanced Topics”in Chapter 12. 
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We define a /earPz7gs rule as a procedure for modifying the weights and biases 
of anetwork. (This procedure may also be referred to as a training algorithm.) 
The learning rule is applied to train the network to perform some particular 
task. Learning rules in this toolbox fall into two broad categories: Supervised 
learning, and unsupervised learning. 


In sxperuisea /ear7z1s,thelearning rule is provided with a set ofexamples (the 
厅 razi2z7g set) of proper network behavior 


{p1t] ， {p2t?} 9 {po'te} 


where p，is an input to the network, and t is the corresponding correct 
(tarset) output. As the inputs are applied to the network, the network outputs 
are compared to the targets. The learning rule is then used to adjust the 
weights and biases ofthe network in order to move the network outputs closeT 
to the targets. The perceptron learning rule falls in this supervised learning 
category. 


In x7speruisea /earmzi1s, the weights and biases are modified in response to 
network inputs only. There are no target outputs available. Most of these 
algorithms perform clustering operations. They categorize the input patterngs 
into a finite number ofclasses. This is especially useful in such applicationgs as 
vector quantization . 


As noted, the perceptron discussed in this chapter is trained with supervised 
learning. Hopefully, a network that produces the right output for a particular 
input will be obtained. 


Perceptron Learning Rule (learnpl) 





Perceptfron Learning Rule (learnp) 


Perceptrons are trained on examples of desired behavior. The desired behavior 
can be summarized by a set ofinput, output pairs 


plt1,po2t1…， paqtaQ 


where p is an input to the network and tis the corresponding correct (target) 
output. The objective is to reduce the error e, which is the difference t- a 
between the neuron response ay and the target vector t. The perceptro7m 
/ear7z7S 7rule learnp calculates desired changes to the perceptron's weights 
and biases given an input vector p, and the associated error e. The target 
vectort must contain values of either 0 or 1, as perceptrongs (with hard1lim 
transfer functions) can only output such values. 


卫 ach time learnp is executed, the perceptron has a better chance of producing 
the correct outputs. The perceptron rule is proven to converge on a solution in 
a finite number of iterations 让 a solution exists. 


Ifabias is notused, Ilearnp works to find a solution by altering only the weight 
Vector w to point toward input vectors to be classified as 1, and away 位 om 
Vectors to be classified as 0. This results in a decision boundary that is 
perpendicular to w, and which properly classifies the input vectors. 


There are three conditiongs that can occur for a single neuron once an input 
vector p is presented and the network's response a is calculated: 


CASE 1. If an input vector is presented and the output of the neuron is correct 
(a=tande=t-a=0),thenthe weightvector wis not altered. 


CASE 2，Ifthe neuron output is 0 and should have been 1(a= 0 andt=1,and 
e=t-a=l),theinputvectorp is added to the weight vector w. This makes 
the weight vector point closer to the input vector, increasing the chance that 
the input vector will be classified as a 1 in the future. 


CASE 3. Ifthe neuron output is 1 and should have been 0(a= landt= 0,ande 
=t 一 aa=-1l),the inputvector p is subtracted fom the weight vector w. This 
makes the weight vector point farther away from the input vector, increasing 
the chance that the input vector is classified as a 0 in the future. 


The perceptron learning rule can be written more Succinctly in terms of the 
errore =t- au andthe change to be made to the weight vector Aw: 
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CASE 1. Ife = 0, then make a change Av equal to 0. 
CASE 2，Ife = 1, then make a change Avw equal to pT. 


CASE 3，Ife = -1, then make a change Avwv equal to -pT. 


All three cases can then be written with a single expression: 
Aw = (人 -al)pT = epT 


We can getthe expression for changes in aneuron's bias bynoting that the bias 
is Simply a weight that always has an input of 1: 


AD = (ft-aw)(1) = e 
For the case of a layer ofneurons we have: 
AW = (t-a)(p)I = e(p)T and 
Ab = (t-a) = 开 
The Perceptron Learning Rule can be summarized as follows 


0O7Q 


WW ”= 区 +ep7 and 


ze 攻 bo244 呈 


e 
where e = 上- 3. 


Now let us try a Simple example. We start with a single neuron having ainput 
vector with just two elements. 


net = newp([-2 2;-2 +2],1); 
To simplify matters we set the bias equal to 0 and the weights to 1 and -0.8. 


net.b{1} = [0]; 
w= [1 -0.8]; 
net.IW{1,1} = Wij 


The input target pair is given by 


p= [1; 2]; 
t= [1 


Perceptron Learning Rule (learnpl) 





We can compute the output and error with 


a= Simn(net,p) 
己 一 
0 
e = 七 -a 
e 2 


and finally use the function learnp to find the change in the weights. 


dw = learnptw;p,[], [el II]) 
1 2 


The new weights, then, are obtained as 
WwW=W+dw 
W 2 


2.0000 1.2000 


The process offinding new weights (and biases) can be repeated until there are 
Do errors. Note that the perceptron learning rule is guaranteed to converge in 
a finite number of steps for all problems that can be solved by a perceptron. 
These include all classification problems that are“"linearly separable.”The 
objects to be classified in such cases can be separated by a single line. 


Youmight wantto try demo nnd4pr.Itallows youto pick new inputvectors and 
apply the learning rule to classify them. 
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Isimand learnp are used repeatedly to present inputs to a perceptron, and to 
change the perceptron weights and biasegs according to the error, the 
perceptron will eventually find weight and bias values that solve the problem， 
given that the perceptron ca7. solve it. 了 Each traverse through all ofthe training 
input and target vectors is called a pass. 


The function train carries out such a loop of calculation. In each pass the 
fonction train proceeds through the specified sequence of inputs, calculating 
the output, error and network adjustment for each input vector in the sequence 
as the inputs are presented. 


Note that train does not guarantee that the resulting network does its job. 
The new values of W and b must be checked by computing the network output 
for each input vector to see 计 all targets are reached. If a network does not 
perform successfully it can be trained further by again calling train with the 
new weights and biases for more training passes, or the problem can be 
analyzed to see 放 志 is a suitable problem for the perceptron. Problems which 
are not Solvable by the perceptron network are discussed in the “Limitationgs 
and Cautions”section . 


To illustrate the training procedure, we will work through a simple problem. 
Congsider a one neuron perceptron with a single vector input having two 
elements. 


Input ”Perceptron Neuron 





= hardlim(Wp +D) 


This network, and the problem we are about to consider are Simple enough that 
you can follow through what is done with hand calculations if you want. The 
problem discussed belovw follows that found in [HDB1996|. 
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Let us Suppose we have the following classification problem and would like to 
solve it with our single vector input, two-element perceptron network. 


本 位 = 加 站 地- 国 s 直 蕊 = 国 < 


Use the initial weights and bias. We denote the variables at each step of thigs 
calculation by using a number in parentheses after the variable. Thus, above， 
we have the initial values, W(0) and 0(0). 


w(OOD=lool 20=0 


We start by calculating the perceptron's output a for the first input vector p1， 
using the initial weights and bias. 


S 
册 


几 arad1z7(W(O)pl +D(0O)) 


党 haralnlB 0| 四 :0 = 几 arqdLz7(0) = 工 


The output a does not equal the target value , So we use the perceptron rule 
to find the incremental changes to the weights and biases based on the erTror. 
e= 如 -4=0-1= 开 | 
全 
AW =epl = (-1Dl22| = [-2 -?| 
AD =e=(-D=-l 


You can calculate the new weights and bias using the Perceptron update rules 
Shown previously. 


ZLQ 全 
W "= 网 “+ep = 0 0] 十 [2 -?] 疏 [2 -2| = WI(1) 
1.e 贡 0O1Q 
be = .00+e=-0+(C1) = -1=0(0H) 
Now present the next input vector, p7 The output is calculated below. 
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a = araQlz7(W(1)po+p(I)) 


= haral[ -2| 四 一 | = 几 arQ1z72(1) = 工 
On this occasion, the target is 1, so the error is zero. Thus there are no changes 
in weights or bias, so W(2) = W(1) = [2 -2] and bp(2) = (1) = -1 


We can continue in this fashion, presenting pi3 next, calculating an output and 
the error, and making changes in the weights and bias, etc. After making one 
pasgs through all ofthe four inputs, you get the values: W(4) = [3 - and 
(4) = 0.To determine 让 we obtained a satisfactory solution, we must make 
one pass through all input vectors to see 让 they all produce the desired target 
values. This is not true for the 4th input, but the algorithm does converge on 
the 6th presentation of an input. The final values are: 


W(6) = |-2 -3 and 5(6) = 1 


This concludes our hand calculation. Now, how can we do this using the train 
fonction? 


The following code defines a perceptron like that shown in the previous fgure， 
with initial weights and bias values of 0. 


net = newp([-2 2;-2 +2],1); 
Now consider the application of a single input. 
pP =[2; 2]; 
having the target 
t =[0] 


Now set epochs to 1l, so that train will go through the input vectors (only one 
here) just one time. 


net.trainParam.epochs = 1; 
net = train(net,p,t) ; 


The new weights and bias are 


-2 -2 


Training (train| 





-1 


Thus, the initial weights and bias are 0, and after training on only the first 
vector, they have the values [-2 -2] and -1l, just as we hand calculated. 


Wenow applythe second input vector po .The outputis 1l, as it will be untilthe 
weights and bias are changed, but now the target is 1l, the error will be 0 and 
the change will be zero. We could proceed in this way, starting from the 
previous result and applying a new input vector time after time. But we can do 
this job automatically with train. 


Now let's apply train for one epoch, a single pass through the sequence of all 
four input vectors. Start with the network definition . 


net = newp([-2 2;-2 +2],1); 
net.trainParam.epochs = 1; 

The input vectors and targets are 
p= [[2;2] [1;-2] [-2;2] [-13;1]] 
t =[0 101] 

Now train the network with 


net = train(net,p ,七 ) ; 


The new weights and bias are 


W 三 

-3 -1 
b = 

0 


Note thatthis is the same result as we got previously by hand. Finally simulate 
the trained network for each ofthe inputs. 


a= Simn(net,p) 
己 一 
[0] [0] [1] [1] 


The outputs do not yet equal the targets, so we need to train the network for 
more than one pass. We will try four epochs. This run gives the following 
results. 
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TRAINC，Epoch 0/20 
TRAINC，Epoch 3/20 
TRAINC，Performance goal met ， 


Thus, the network was trained by the time the inputs were presented on the 
third epoch. (As we know 位 om our hand calculation, the network converges on 
the presentation of the Sixth input vector. This occurs in the middqle ofthe 
second epoch, but it takes the third epoch to detect the network convergence.) 
The final weights and bias are 


-2 -3 


1 
The simulated output and errors for the various inputs are 


aa 三 

0 1.00 0 1.00 
error = [a(1)-t(1) a(2)-t(2) a(3)-t(3) a(4)-t(4)] 
error = 


0 0 0 0 


Thus, we have checked that the training procedure was successful. The 
network converged and produces the correct target outputs for the four input 
Vectors， 


Note that the default training function for networks created with newp is 
trains. (You can find this by executing net.trainFcn.) This training function 
applies the perceptron learning rule in its pure form, in that individual input 
vectors are applied individually in sequence, and corrections to the weights and 
bias are made after each presentation of an input vector. Thus, perceptron 
training with train will converge in a finite number of steps unless the 
problem presented can not be solved with a simple perceptron . 


The function train can beused in various waysby othernetworks as well.Type 
help train to read more about this basic function. 


You may want to try various demongstration programs. For instance, demop1 
illustrates classification and training of a simple perceptron. 


Limitations and Cautions 





Limifafions and Caufions 


Perceptron networks should be trained with adapt, which presents the input 
Vectors to the network one at atime and makes corrections to the network 
based on the results of each presentation. Use of adapt in this way guaranteesgs 
that any linearly separable problem is solved in a finite number oftraining 
presentations. Perceptrons can also be trained with the function train, which 
is presented in the next chapter. When train is used for perceptrons, 进 
presents the inputs to the network in batches, and makes correctiongs to the 
network based on the sum of all the indqividual corrections. Unfortunately， 
there is no proof that such a training algorithm converges for perceptrons. On 
that account the use oftrain for perceptrons is not recommended. 


Perceptron networks have several limitations. First, the output values of a 
perceptron can take on only one of two values (0 or 1) due to the hard-limit 
transfer function. Second, perceptrongs can only classify linearly separable sets 
of vectors. If a straight line or a plane can be drawn to separate the input 
Vectors into their correct categories, the input vectors are linearly separable. 开 
the vectors are not linearly separable, learning will never reach a point where 
all vectors are classified properly. Note, however, that it has been proven that 
iftthe vectors are linearly separable, perceptrons trained adaptively will always 
find a solution in finite time. You might want to try demop6. It shows the 
difficulty oftrying to classify input vectors that are not ljinearly separable. 


It is only fair, however, to point out that networks with more than one 
perceptron can be used to solve more difficult problems. For instance, Suppose 
that you have a set of four vectors that you would like to classify into distinct 
groups, and thattwo lines can be drawn to separate them. A two neuron 
network can be found such that its two decision boundaries classify the inputs 
into four categories. For additional discussion about perceptrons and to 
examine more complex perceptron problems, see [HDB1996|]. 


Outliers and the Normalized Perceptfron Rule 


Long training times can be caused by the presence of an ovlzier input Vector 
whose length is much larger or smaller than the other input vectors. Applying 
the perceptron learning rule involves adding and subtracting input Vectors 
位 om the current weights and biases in response to error. Thus, an input vector 
with large elements can lead to changes in the weights and biases that take a 
long time for a much smaller input vector to overcome. You might want to try 
demop4 to see how an outlier affects the training. 
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By changing the perceptron learning rule slightly, training times can be made 
insengsitive to extremely large or small outlier input vectors. 


Here is the original rule for updating weights: 
Aw = (人 -a)pT = epT 


As shown above, the larger an inputvector p,the larger its effect on the weight 
vector w. Thus, ifan input vector is much larger than other input vectors, the 
smaller input vectors must be presented many times to have an effect. 


The solution is to normalize the rule so that effect ofeach input vector on the 
weights is of the same magnitude: 


PIT _ 。PT 
1 
The normalized perceptron rule is imnplemented with the fanction learnpn， 
which is called exactly like learnpn. The normalized perceptron rule function 
learnpn takes slightly more time to execute, but reduces number of epochs 
considerably ifthere are outlier input vectors. You might try demop5 to see how 
this normalized training rule works, 
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Graphical User Interface 


Introduction fo fhe GUI 


The graphical user interface (GUDJ) is designed to be simple and user friendly， 
but we will go through a simple example to get you started. 


In what follows you bring up a GUI Network/Data Manager window. This 
window has its own work area, separate 位 om the more familiar command line 
workspace. Thus, when using the GUI, you might“export”the GUI results to 
the (command line) workspace. Similarly you may want to“import”results 
位 om the command line workspace to the GUI. 


Once the Network/Data Manager is up and running, you can create a 
network, view it, train it, simulate it and export the final results to the 
workspace. Similarly, you can import data 位 om the workspace for use in the 
GUI. 


The following example deals with a perceptron network. We go through all the 
steps of creating anetwork and show you what you might expect to see as yoU 
go along. 


Creafte a Percepfron Netfwork (nntool) 


We create a perceptron network to perform the AND function in this example. 
Ithas an inputvectorp= [0011;0101]andatargetvector t=[0 0 0 1]. 
Wecall the network ANDNet. Once created, the network will be trained. We can 
then save the network, its output, etc., by “exporting”it to the command line. 


Inpuf and targef 
To start, type nntoo1. The following window appears. 
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NetworkyData Manager | _ 上 口 | x| 
Networks: 





Inputs: 





Input Delay States: Layer Delay 5tates: 











Networks and Data 


Help | New Data.… | New Network. .| 
Import. | Export.. | 区 7 | DBE 


Networks only 


| 元 局 [ 懈 三 | SITT 吕 村 已 王 | 二 尊王 | 赤 可 5 汪 | 


Neural Network Toolbox - wersion 二 Dbd4 








Click on Help to get started on anew problem and to see descriptions of the 
buttons and lists. 


First, we want to define the network input, which we call p, as having the 
particular value [0011;0101H.Thus,the network had a two-element input 
and four sets of such two-element vectors are presented to it in training. To 
define this dqata, click on New Data, and a new window, Create New Data 
appears. Set the Name to p, theValueto [0011;0101,andmake surethat 
Data Type is set to Inputs.The Create New Data window will then look like 
this: 
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Create New Data  _ | 口 ] | 
Data Type 


fe |nputs 


人 Targets 
全 Input Delay States 
人 Layer Delay States 
全 Dutputs 
f ETTof 


Help | [Cancel | Create | 


Now click Create to actually create an input file p. The Network/Data 
Manager window comes up and p shows as an input. 





Next we create anetwork target. Click on New Data again, and this time enter 
the variable name t, specify the value [0 0 0 1], and click on Target under 
data type. Again click on Create and you will see in the resulting Network/ 
Data Manager window that younow havetasatarget as well as the previous 
p as an Input. 


Create Network 

Now we want to create a new network, which we will call ANDNet.To do this， 
click on New Network, and a CreateNew Network window appears. Enter 
ANDNet under Network Name. Set the Network Type to Perceptron, for that 
is the kind of network we want to create. The input ranges can be set by 
entering numbers in that fieldq, but it is easier to get them 位 om the particular 
input datathat youwantto use. To dothis, click on the down arrow at theright 
side of Input Range. This pull-down menu shows that you can get the input 
ranges 位 om the file p 过 you want. That is what we want to qo, so click on p. 
This should lead to input ranges [0 1;0 1].We wantto useahardlimtransfer 
fanction and a learnp learning function, so set those values using the arrowsg 
for Transfer function and Learning function respectively. By now yourT 
Create New Network window should look like: 
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[Create New Network  _ | 口 | x| 


Network Name: [PNDNet 
Network Type: | Perceptron 呈 | 


Input ranges: [olol [6aromnput 辐 
Numberofneuor 站 
Transfer functior': [haFob ”本 
Learning function': [LEaPNP 本 


WieW | Defaults | Cancel | Create | 











Next you might look at the network by clicking on View. For example: 


View of New Network  _ 上 口 | x| 





This picture showsgs that you are about to create a network with a single input 
(composed of two elements), a hardlim trangsfer function, and a single output. 
This is the perceptron network that we wanted. 


Now click Create to generate the network. You will get back the Network/ 
Data Manager window. Note that ANDNet is novw listed as a network. 
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Train fthe Perceptron 


To train the network, click on ANDNet to highlight it. Then click on Train. This 
leads to a new window labeled Network:ANDNet. At this point you can View 
the network again by clicking on the top tab Train. You can also check on the 
initialization by clicking on the top tab Initialize. Now click on the top tab 
Train. Specify the inputs and output by clicking on the left tab Training Info 
and selecting p from the pop-down list of inputs and t 位 om the pull-down jlist 
of targets. The Network:ANDNet window should look like: 


Network: 上 NDNet -| 口 | x| 
Wiew | |nitialize | Simulate 。 Tranm | 凸 dapt | Weights | 


Training Info | Training Parameters | Optionallnfo | 









Training Data Training Results 


Dutputs PN DNet_outputs 
Errors PN DNet_errors 


有 有 本 OUUESTSTSTES |NDNet_inputStates 













Inputs 


Targets [ 了 | 
| 亲 必 ODSLES [eeroa -| 


| 亲 全 请 因 司 夯 下 | 呈 胞 













PESTETU ETES 2NDNet layerotates 


Manager | Train 全 etwork | 


Note that the Training Result Outputs and Errors have the name ANDNet 
appended to them. This makes them easy to identify later when they are 
exported to the command line. 





While you are here, click on the Training Parameters tab. It shows you 
parameters such as the epochs and error goal. You can change these 
parameters at this point 让 you want. 


Now click Train Network to train the perceptron network. You will see the 
following training results. 
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Training with TRAINC  _ | 口 ] x| 
File Edit ITools window Help 


Performance isSD, Goal isDb 


Training-Blue 
瑟 ”局 
Cn DO 


吕 
加 


巧 


0 0.5 1 二 区 吕 5 了 号 扣 


Stop Training | 4 Epochs 


Thus, the network was trained to zero erTror in four epochs. (Note that other 
kinds of networks commonly do not train to zero error and their erTror 
commonly cover a much larger range. On that account, we plot their errors on 
alog scale rather than on a linear Scale Such as that used above for 
perceptrons.) 





You can check that the trained network does indeed give zero error by using 
the input p and simulating the network. To do this, get to the Network/Data 
Manager window and click on Network Only: Simulate). This will bring up 
the Network:ANDNet window. Click there on Simulate. Now use the Input 
pull-down menu to specify p as the input, and label the output as 
ANDNet_outputsSimto distinguish it 他 om the training output. Now click 
Simulate Network in the lower right corner. Look at the Network/Data 
Manager and you will see a new variable in the output: ANDNet_outputsSim. 
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Double-click on it and a small window Data:ANDNet_ outputsSim appears 
with the value 


[0 0 0 1] 


Thus, the network does perform the AND ofthe inputs, givingalas anoutput 
only in this last case, when both inpputs are 二. 


Export Perceptron Results to Workspace 

To export the network outputs and errors to the MATLAB command line 
Workspace, click in the lower left of the Network:ANDNet windovw to go back 
to the Network/Data Manager. Note that the output and error for the ANDNet 
are listed in the Outputs and Error lists on the right side. Next click on 

玉 xport This will give you an 了 xport or Save from Network/Data Manager 
window. Click on ANDNet_outputs and ANDNet_errors to highlight them, and 
then click the 了 xport button. These two variables now should be in the 
command line workspace. To check this, go to the command line and type who 
to see all the defined variables. The result should be 


who 
Your variables are: 
ANDNet_errors ANDNet_outputs 


You might type ANDNet_outputs and ANDNet_errors to obtain the following 


ANDNet_outputs = 
0 0 0 1 


and 


ANDNet_errors = 
0 0 0 0 . 


You can export p, t, and ANDNet in a similar way. You might do this and check 
with who to make sure that they got to the command line. 


Novw that ANDNet is exzported you can View the network description and 
examine the network weight matrix. For instance, the command 


ANDNet .Iw{1,1} 
giveg 


ans = 
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2 1 


Similarly， 
ANDNet .b{1} 
yields 


ansS = 
-3. 


Clear Nefwork/Data Window 


You can clear the Network/Data Manager window by highlighting a variable 
such as p and clicking the Delete button until all entries in the list boxes are 
gone. By doing this, we start 位 om clean slate. 


Alternatively, you can quit MATLAB. Arestart with anew MATLAB, followed 
by nntool, gives a clean Network/Data Manager window, 


Recall however, that we exported p，t, etc., to the command line 位 om the 
perceptron example. They are still there for your use even after you clear the 
Network/Data Manager. 


Imporfing from the Command Line 


To make thing simple, quit MATLAB. Start it again, and type nntool to begin 
a new SeSslon . 


Create a new Vector. 
r= [0; 1; 2; 3] 
广 2 
0 
1 
2 
3 


Now click on Imnport, and set the destination Name to R (to distinguish 
between the variablenamed atthe command line and the variable in the GUD). 
You will have a window that looks like thigs 


Graphical User Interface 





Import or Load to NetworkyData Manager | 


5ource Select awarlable Destination 
人 Import from MATLAB [no selection] Name 


广 Loadfrom disk fie DR 
民 攻 ii 辐 性 Import 总 s: 
| | 他 Network 

下 | f Inputs 


人 Targets 
全 InitialInput States 





全 Initial Layer States 
全 Dutputs 
人 Errors 


Cancel | Import | 


Novw click Import and verify by looking at the Network/DAta Manager that 
the variable R is there as an input. 


Save a Variable fo a File and Load If Later 


Bring up the Network/Data Manager and click on New Network. Set the 
nameto mynet. Click on Create.Thenetwork name mynet should appear in the 
Network/Data Manager. In this same manager window click on 卫 xport. 
Select mynet in the variable list of the Export or Save window and click on 
Save. This leads to the Save to a MIAT file window. Save to afile mynetfile. 


Now lets getrid ofmynet in the GUI and retrieve it 人 om the saved file. First go 
to the Data/Network Manager, highlight mynet, and click Delete. Next click 
on Import. This brings up the Import or Load to Network/Data Manager 
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window. Select the Load from Disk button and type mynetfile as the 
MAT-file Name. Now click on Browse. This brings up the Select MAT file 
window with mynetfile as an option that you can select as a variable to be 
imported. Highlight mynetfile, press Open, and you return to the Import or 
Load to Network/Data Manager window. On the Import As list, select 
Network. Highlight mynet and lick on Load to bring mynet to the GUI. Now 
mynet is back in the GUI Network/Data Manager window， 


Summary 





Summary 


Perceptrons are useful as classifiers. They can classify linearly separable input 
Vectors very well. Convergence is guaranteed in a finite number of steps 
providing the perceptron can solve the problem. 


The design of a perceptron network is constrained completely by the problem 

to be solved. Perceptronshaveasinglelayerofhard-limit neurons. The number 
of network inputs and the number ofneurons in the layer are constrained by 

the number ofinputs and outputs required by the problem. 


Training time is sensitive to outliers, but outlier input vectors do not stop the 
network 位 om fndqing a solution. 


Single-layer perceptrongs can solve problems only when data is linearly 
separable. This is seldom the case. One solution to this difficulty is to use a 
preprocessing method that results in linearly separable vectors. Or you might 
use multiple perceptrons in multiple layers. Alternatively, you can use other 
kinds ofnetworks such as linear networks or backpropagation networks, which 
can classify nonlinearly separable input vectors. 


A graphical user interface can be used to create networks and data, train the 
networks, and export the networks and data to the command line workspace. 


Figures and Equations 


Perceptron Neuron 


Input ”Perceptron Neuron 


Where.. 


RR=number of 
elements in 
input vector 





4=hardlim(YWp+D) 
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Perceptron Transfer Function,， hardlim 





QG = ja1al(A) 


Hard-Limit Transfer Function 


Decision Boundary 


WV 





Summary 





Perceptron Architecture 


Input 1 Perceptron Layer 





al=hardlimdGw:pl:+by) 


Where.. 


有 R =number of elements in Input 





S1 = number of neurons in layer 1 
al= hardlim(Iwi:p: +bD) y 


The Perceptron Learning Rule 


ZLQ 全 
W "= 凤 " +ep 
刘 2e 攻 bod ee 
where 
e = 二 -3 
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One Perceptiron Neuron 


Input ”Perceptron Neuron 





= hardlimn(YWp+D) 


New Functiions 
This chapter introduces the following nevw fanctions. 








Function Descripfion 

hard1lim Ahard limit transfer function 

initzero Zero weight/bias initialization function 
dotprod Dot product weight function 

newp Creates a new perceptron network， 

Sim Simulates a neural network. 

init Initializes a neural network 

learnp Perceptron learning function 

Learnpn Normalized perceptron learning fanction 
nntoo1 Starts the Graphical User Interface (GUD) 
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Inftroduction 


Thelinear networks discussed in this chapter are similar to the perceptron, but 
their transfer function is linear rather than hard-limiting. This allows their 
outputs to take on anyvalue, whereas the perceptron output is limited to either 
0 or 1. Linear networks, like the perceptron, can only solve linearly separable 
problems. 


Here we will design a linear network that, when presented with a set of given 
input vectors, produces outputs of corresponding target vectors. For each input 
Vector we can calculate the network's output vector. The difference between an 
output vector and its target vector is the error. We would like to find values for 
the network weights and biases such that the sum ofthe squares of the erTors 
is minimized or below a specific value. This problem is manageable because 
linear Systems have a single error minimum. In most cases, we can calculate a 
linear network directly, such that its error is a minimum for the given input 
vectors and targets vectors. In other cases, numerical problems prohibit qirect 
calculation. Fortunately, we can always train the network to have a minimum 
error by using the Least Mean Squares (Widrow-Hoffb algorithm. 


Note that the use of linear filters in adaptive systemsg is discussed in Chapter 
10. 


This chapter introduces newlin, a function that creates a linear layer, and 
newlind, a function that designs a linear layer for a specific purpose. 


You can type help Linnet to see alist oflinear network functions， 
demonstrations, and applications. 


Neuron Model 





Neuron Model 


Alinear neuron with 尺 inputs is shown below, 


Linear Neuron with 
Input Vector Input 


Where.. 


尺 =number of 
elements in 
input vector 





4= Drelin(Wp+D) 
This network has the same basic structure as the perceptron. The only 


difference is that the linear neuron uses a linear trangsfer function, which we 
will give the name purelin. 





QG = PUrelif1) 
Linear Transfer Function 
The linear transfer function calculates the neuron's output by simplyreturning 
the value passed to t. 
Q = Dreli() = DrelIi(Wp+p) = Wp+pD 


This neuron can be trained to learn an affine function ofits inputs, or to find a 
linear approximation to a nonlinear function. A linear network cannot, of 
course, be made to perform a nonlinear computation 
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Neftwork Archifecture 


The linear network shown below has one layer of S neurons connected to 民 
inputs through a matrix of weights W. 


Layer of Linear 
Input Neurons Input Layer of Linear Neurons 





a= purelin(Wp+b) 


Where.. 尺 = numberof 
elements in 
input vector 





S=numberof 
a= purelin(Wp+b) neurons in layer 


Note that the figure on the right defines an 9-length output vector a. 


We have shown a single-layer linear network. However, this network is just asgs 
capable as multilayer linear networks. For every multilayer linear network， 
there is an equivalent single-layer linear networK. 


Creafting a Linear Neuron (newlin) 


Congsider a single linear neuron with two inputs. The diagram for this network 
is Shown below. 


Network Architecture 





Input Simple Linear Network 





4=DUreli(YVp+D) 


The weight matrix W in this case has only one row. The network output is: 
Q = Dreli7(7) = DrelI(Wp+pD) = Wp+pD OF 
qa = ll1D1+W12D2+D 


Like the perceptron, the linear network has a aecisiom poxPQary that ig 
determined by the input vectors for which the net inputm is zero. For 刀 = 0 
the equation Wp+p = 0 specifies such a decision boundary as shown below 
(adapted with thanks from [HDB96]). 


Q<0 











Input vectors in the upper right gray area will lead to an output greater than 
0. Input vectors in the lower left white area will lead to an output less than 0. 
Thus, the linear network can be used to clasgsify objects into two categories, 
However, it can classify in this way only 这 the objects are linearly separable. 
Thus, the linear network has the same limitation as the perceptron. 
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We can create anetwork like that shown above with the command 


net = newlin( [-11; -1 1],1); 


The first matrix of arguments specify the range ofthe two scalar inputs. The 
last argument, 1, says that the network has a single output. 


The network weights and biases are set to zero by default. You can see the 
current values with the commands 


W = net.IW{1,1} 
W = 
0 0 
and 
b= net.b{1} 
b = 
0 


However, you can give the weights any value that you want, such as 2 and 3 
respectively,， with 


net.IW{1,1} = [2 3]; 


W = net.IW{1,1} 
钳 过 


2 3 


The bias can be set and checked in the same way、. 


net.b{1}+ =[-4]; 
b net.b{1+ 
b ee 


-4 
You can simulate the linear network for a particular input vector. Try 
p = [5;6] 


Novw you can find the network output with the function sinm. 


a 
aa 三 


Slim(net,p) 


24 


Network Architecture 





To summarize, you can create a linear network with newlin, adjust its 
elements as you want, and simulate it with sim. You can fnd more about 
newlin by typing help newlin. 
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Mean Square Error 


Like the perceptron learning rule, the least mean Square erTor (LMS) 
algorithm is an example of supervised training, in which the learning rule is 
provided with a set of exzamples of desired network behavior: 


{p1t} ， {p2t?} 9 {pa'to} 


Here p_is an input to the network, and t，is the corresponding target output. 
As each input is applied to the network, the network output is compared to the 
target. The error is calculated as the difference between the target output and 
the network output. We want to minimize the average of the sum of these 
erTorSs， 


Q Q 
128e = 5 >》 e( 及 ) 和 二， (it(B) 一 a( 有 )) 


The LMS algorithm adqjusts the weights and biases ofthe linear network So as 
to minimize this mean Square erTor. 


Fortunately, the mean square error performance index for the linear network 
is aquadratic function. Thus,the performance index will either have one global 
minimum, a weak minimum or no minimum, qdepending on the characteristics 
of the input vectors. Specifically, the characteristics of the input vectors 
determine whether or not a unique solution exists 


You can find more about this topic in Ch. 10 of [HDB96]. 





Linear System Design {newlindj 





Linear System Design (newlind) 


Unlike most other network architectures, linear networks can be designed 
directly 计 input/target vector pairs are known. Specific network values for 
weights and biases can be obtained to minimize the mean Square error byuUsing 
the function newlind. 


Suppose that the inputs and targets are 


P= [123]; 
T= [2.0 4.1 5.9]; 


Now you can design a network. 
net = newlind(P,T) ; 


You can Simulate the network behavior to check that the design was done 


properly. 
Y=Sim(net,P) 
一 


2.0500 4.0000 5.9500 


Note that the network outputs are quijte close to the desired targets. 


You might try demolin1. It showsgs erTror Surfaces for a particular problem， 
illustrates the design and plots the designed solution. 


The function newlind can alsobe usedto design linear networks having delays 
in the input. Such networks are discussed later in this chapter. First, however， 
we need to discuss delays. 
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Linear Networks with Delays 


Tapped Delay Line 


We need anew component, the tapped delay line, to make full use ofthe linear 
network. Such a delay line is shown below. There the input signal enters from 
the left, and passes through NW-l delays. The output of the tapped delay line 
(TDL) is an NW-dimensional vector, made up ofthe input signal at the current 
time, the previous input signal, etc. 





Linear Filfer 


We can combine atapped delay line with an linear network to create the /Zear 
和 lter shown below, 
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TDL Linear Layer 


P(A 





The output ofthe filter is given by 


尺 
a(R) = DrelLI(WPp +D) = 归 Il ia( 有 -+1I)+D 
Z = 工 
The network shown above is referred to in the digital sijgnal-processing field as 


a finite imnpulse response (了 IR) flter [WiSt85]. Let us take a look at the code 
that we use to generate and simulate such a Specific network. 


Suppose that we want a linear layer that outputs the sequence T given the 
Sequence P and two initial input delay states Pi. 


P={121332}| 
Pi = {1 3}; 
T={t5642078}; 


You can use newlind to design a network with delays to give the appropriate 
outputs for the inputs. The delay initial outputs are Supplied as a third 
argument as Shown below. 
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net = newlind(P,T,PI) ; 
Now we obtain the output ofthe designed network with 
Y=Sim(net,，P,Pi) 
to give 
Y = [2.73] [10.54] [5.01] [14.95] [10.78] [5.98] 


As you can see, the network outputs are not exactly equal to the targets, but 
they are reasonably close, and in any case, the mean square error is minimized. 


LIMS Algorithm (learnwhl 





LMS Algorithm (learnwh) 


The LMS algorithm or Widrow-Hofflearning algorithm, is based on an 
approximate steepest descent procedure. Here again, linear networks are 
trained on examples of correct behavior. 


Widrow and Hoffhad the insight that they could estimate the mean Square 
erTror by using the squared error at each iteration. If we take the partial 
derivative ofthe squared error with respect to the weights and biases atthe Ath 
iteration we have 


ae2 -2e (有 SC 
比 了 7 

for 7 = 12,.., 尺 and 
ae2 2 多 - - 2e (有 2 


Next look at the partial derivative with respect to the error， 


人 = 一 = -9 [Et 有-(Wp(D)+D] or 
Z0O 











OU 1 7 OO 1 ,7 1 ,7 
尺 
oOe(E) 0 本 
9017 9wli 忆 | 六 
， = 工 

Here pi(k) is the zth element ofthe input vector at the Pth iteration. 
Similarly， 

和 人) Di(P) 
This can be simplified to: 

oe(RE) _ 

0 -Di(P) and 
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oe(R) _ _] 
00 


Finally, the change to the weight matrix and the bias will be 
2oe(R)p(E) and 2oe(R) . 


These two equations form the basis ofthe Widrow-Hoff (LMS) learning 
algorithm. 


These results can be extended to the case of multiple neurons, and written ip 
Imatrix form as 


W(E+I1) = W(B)+2ae(E)p7(R) 


b(R+1) = b(R)+2oe(R) 


Here the error e and the bias b are vectors and Q is a /ear7zz1g ra 如 . 开 o is 
large, learning occurs quickly, butifit is too largeitmay lead to instability and 
erTrors may even increase. To ensure stable learning, the learning rate must be 
less than the reciprocal ofthe largest eigenvalue ofthe correlation matrix p7p 
of the input vectors， 


You might want to read some of Chapter 10 of [HDB96] for more information 
about the LMS algorithm and its convergence. 


Fortunately we have atoolbox function learnwh that does all ofthe calculation 
for us. It calculates the change in weights as 


dw = LIFrxexp' 
and the bias change as 
db = Jrxe 
The constant 2, Shown a few lines above, has been absorbed into the code 


learning rate 1Lr. The function maxlinlr calculates this maximum stable 
learning rate Lr as 0.999 * Ps*#P. 


Type help learnwh and help maxlinlr for more details about these two 
fonctions， 
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Linear Classificaftion (train) 


Linear networks can be trained to perform linear classification with the 
fanction train. This function applies each vector of a set of input vectors and 
calculates the network weight and bias increments due to each of the inputs 
according to learnp. Then the network is adjusted with the sum of all these 
corrections. We will call each pass through the input vectors an epoch. This 
contrasts with adapt, discussed in “Adaptive Filters and Adaptive Training”in 
Chapter 10, which adqjusts weights for each input vector as it is presented. 


Finally, train applies the inputs to the new network, calculates the outputs， 
compares them to the associated targets, and calculates a mean square erTror. 
Ifthe error goal is met, or 让 the maximum number ofepochs is reached, the 
training is stopped and train returns the new network and a training record. 
Otherwise train goes through another epoch. Fortunately, the LMS algorithm 
converges when this procedure is executed. 


To illustrate this procedure, we will work through a simple problem. Consider 
the linear network introduced earlier in this chapter. 


Input _ Simple Linear Network 





4=DUreln(YWp+D) 


Next suppose we have the classification problem presented in “Linear Filters” 
in Chapter 4. 


忆 = 生 他- 轩 2 寺 地- 人 村 蕊 = 国 2 


Here we have four input vectors, and we would like a network that produces 
the output corresponding to each input vector when that vector is presented. 
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We will use train to get the weights and biases for a network that produces 
the correct targets for each input vector. The initial weights and bias for the 
new network will be 0 by default. We will set the error goal to 0.1 rather than 
accept its default of 0. 


P= [21 -2 -1;2 -221]; 

t= [0101]; 

net = newlin( [-2 2; -2 2],1); 
net.trainParam.goal= 0.1; 
[net，tr] = train(net,P,t) ; 


The problem runs, producing the following training record. 


TRAINB，Epoch 0/100，MSE 0.5/0.1， 
TRAINB，Epoch 25/100，MSE 0.181122/10.1. 
TRAINB，Epoch 50/100，MSE 0.111233/0.1. 
TRAINB，Epoch 64/100，MSE 0.0999066/0.1， 
TRAINB，Performance goal met ， 


Thus, the performance goal is met in 64 epochs. The new weights and bias are 


Weights = net.iw{1,1} 
Weights = 
-0.0615 -0.2194 
bias = net.b(1) 
bias = 
[0.5899 ] 


We can simulate the new network as shown below. 
A=Sim(net，p) 
A 一 
0.0282 0.9672 0.2741 0.4320， 
We also can calculate the error. 


err = 七 - Sim(net,P) 
err = 
-0.0282 0.0328 -0.2741 0.5680 


Note that the targets are not realized exactly. The problem would have run 
longer in an attempt to get perfect results had we chosen a smaller error goal， 
but in this problem it is not possible to obtain a goal of0. The network is limited 


Linear Classificafion (frain| 





in its capability. See “Limitations and Cautions”at the end of this chapter for 
examples of various limitations. 


This demonstration program demolin2 shows the training of a linear neuron， 
and plots the weight trajectory and error during training 


You also might try running the demonstration program nnd101c. It addressesgs 
aclassic and historically interesting problem, shows how a network can be 
trained to classify various patterns, and how the trained network responds 
when noisy patterns are presented. 


4-17 


作 Linear Filters 





Limifaftions and Caufions 
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Linear networks may only learn linear relationships between input and output 
vectors. Thus, they cannot find solutiongs to some problems. However, even 话 a 
perfect solution does not exist, the linear network will minimize the sum of 
squared errors ifthe learning rate Lris sufficiently small. The network will 
fnd as close a solution as is possible given the linear nature of the network'”s 
architecture. This property holdqs because the error surface of a linear networK 
is amultidimensional parabola. Since parabolas have only one minimum, a 
gradient descent algorithm (such as the LMS rule) must produce a solution at 
that minimum. 


Linear networks have other various limitations. Some of them are discussed 
below. 


Overdetermined Systems 


Consider an overdetermined system. Suppose that we have a network to be 
trained with four 1-element input vectors and four targets. A perfect solution 
to p+pb =tforeachofthe inputs may not exist, for there are four 
constraining equations and only one weight and one bias to adjust. However， 
the LMS rule will still minimize the error. You might try demo1in4 to see how 
this is done. 


Underdetermined Systems 


Consider asingle linear neuron with one input. This time, in demo1lin5, we will 
train it on only one one-element input vector and its one-element target vector: 


P = [+1.0]; 
T = [+0.5]; 


Notethat while there is only one constraint arising from the single input/target 
pair, there are two variables, the weight and the bias. Having more variables 
than constraints results in an underdetermined problem with an infinite 
number of solutions. You can try demoin5 to explore this topic. 


Linearly Dependent Vectors 


Normally it is a straightforward job to determine whether or not a linear 
network can solve a problem. Commonly, if a linear network has at least asgs 
many degrees of freedom (SR+S = number of weights and biases) as 


Limitations and Cautions 





constraints (Q = pairs of input/target vectors), then the network can solve the 
problem. This is true except when the input vectors are linearly dependent and 
they are applied to a network without biases. In this case, as shown with 
demonstration Script demolin6, the network cannot solve the problem with 
Zero error. You might want to try demolin6. 


Too Large a Learning Rate 


Alinear network can always be trained with the Widrow-Hoff rule to find the 
minimum error solution for its weights and biases, as long as the learning rate 
is Small enough. Demonstration Script demolin7 shows whathappens when a 
neuron with one input and a bias is trained with a learning rate larger than 
that recommended by maxlinlLr. The network is trained with two different 
learning rates to show the results of using too large a learning rate. 
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Summary 


Single-layer linear networks can perform linear function approximation or 
pattern assoclation. 


Single-layer linear networks can be designed directly or trained with the 
Widrow-Hoffrule to find a minimum error solution. In addition, linear 
networks can be trained adaptively allowing the network to track changes in 
its enVironment. 


The qdqesign of a single-layer linear network is constrained completely by the 
problem to be solved. The number of network inputs and the number of 
neurons in the layer are determined by the number ofinputs and outputs 
required by the problem. 


Multiple layers in a linear network do not result in a more powerful network， 
so the single layer is not a limitation. However, linear networks can Solve only 
linear problems. 


Nonlinear relationships between inputs and targets cannot be represented 
exactl]y by a linear network. The networks discussed in this chapter make a 
linear approximation with the minimum sum-squared erTror. 


Ifthe relationship between inputs and targets is linear or a linear 
approximation is desired, then linear networks are made for the job. 
Otherwise, backpropagation may be a good alternative. 


Summary 





Figures and Equations 


Linear Neuron 


Linear Neuron with 
Input Vector Input 


Where.. 


RR=number of 
elements in 
input vector 





NILNWUTLI 2 
4=DPDUrel(Wp+D) 


Purelin Transfer Function 





QG = PUelif1) 


Linear Transfer Function 
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Linear Network Layer 


Layer of Linear 
Input Neurons Input Layer of Linear Neurons 





a= purelin(Wp+b) 


Where.. 尺 =numberof 
elements in 
input vector 





S=numberof 
a= purelin(Wp+b) neurons in layer 


Simple Linear Network 


Input _ Simple Linear Network 





Q4=DUrelin(VYp+D) 
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Decision Boundary 


记 











Mean Square Error 


以 
10.Se = 5 >》， e( 有 B)2 
尺 = 工 


Q 
(Ca( 人 六 
有 = 工 


4-23 


作 Linear Filters 





Tapped Delay Line 
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Linear Filter 


TDL Linear Layer 


P(A 





LMS (Widrow-Hoff) Algorithm 
WE+IT = WO +2ae(E)p7(OB) 
b(E+1) = b(E)+2ae() . 


New Functions 
This chapter introduces the following new functions. 








Function Description 
new1lin Creates a linear layer. 
new1lind Design a linear layer. 
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Function Description 
learnwh Widrow-Hoff weight/bias learning rule. 
purelin Alinear trangsfer function. 
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Preprocessing and Postprocessing . 
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Mean and Stand. Dev. (prestd, poststd, trastd) 

Principal Component Analysis (prepca, trapcal) 
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Sample Training Session 


Limitations and Cautions  . 
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Overview 


Backpropagation was created by generalizing the Widrow-Hofflearning rule to 
multiple-layer networks andnonlinear differentiable transfer functions. Input 
vectors and the corresponding target vectors are used to train a network until 
it can approximate a function, associate input vectors with specific output 
vectors, or classify input vectors in an appropriate way as defined by you. 
Networks with biases, a sigmoid layer, and alinear output layer are capable of 
approximating any function with a finite number of discontinuities. 


Standard backpropagation is a gradient descent algorithm, as is the 
Widrow-Hofflearning rule, in which the network weights are moved along the 
negative ofthe gradient of the performance function. The term 
pacppropasatio7m refers to the manner in which the gradient is computed for 
nonlinear multilayer networks. There are a number of variations on the basic 
algorithm that are based on other standard optimization techniques, such as 
conjugate gradient and Newton methods. The Neural Network Toolbox 
implements a number of these variations. This chapter explains hovw to use 
each ofthese routines and discusses the advantages and disadvantages ofeach. 


Properly trained backpropagation networks tend to give reasonable answerTs 
when presented with inputs that they have never seen. Typically, a nevw input 
leads to an output similar to the correct output for input vectors used in 
training that are simijlar to thenew input being presented. This generalization 
property makes it possible to train anetwork on a representative set of input/ 
target pairs and get good results without training the network on all possible 
input/output pairs. There are two features ofthe Neural Network Toolbox that 
are designed to improve network generalization - regularization and early 
stopping. These features and their use are discussed later in this chapter， 


This chapter also discusses preprocessing and postprocessing techniques， 
which can improve the efciency of network training. 


Before beginning this chapter you may want to read a basic reference on 
backpropagation, such as D. 下 Rumelhart, G. 卫 . Hinton, 人 R.J. Williams， 
“Learning internal representations by error propagation,”D. Rumelhart and JJ. 
MecClelland, editors. Paralie' Data Processins, Vol.1, Chapter 8, the M.IT. 
Press, Cambridge, MA 1986 pp. 318-362. This subject is also covered in detail 
in Chapters 11 and 12 of M.T. Hagan, 也.B. Demuth, M.H. Beale, Nexra/ 
ANetzworR Desigsn, PWS Publishing Company, Boston, MA 1996. 


Overview 





The primary objective ofthis chapter is to explain how to use the 
backpropagation training functions in the toolbox to train feedforward neural 
Detworks to solve Specific problems. There are generajlly four steps in the 
trainling process: 


1 Assemble the training data 
2 Create the network object 
3 Train the networK 


4 Simulate the network response to new inputs 
This chapter discusses anumber of different training fanctions, but in using 
each function we generally follow these four steps. 


The next section describes the basic feedforward network structure and 
demonstrates how to create a feedforward network object. Then the simulation 
and training ofthe network objects are presented. 
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Fundamenftals 


Archifecture 


This section presents the architecture of the network that is most commonly 
used with the backpropagation algorithm - the multilayer feedforward 
network. The routines in the Neural Network Toolbox can be used to train 
more general networks; some ofthese will be briefly discussed in later 
chapters. 


Neuron Model (tansig, logsig, purelin) 

An elementary neuron with 刃 inputs is shown below. Each input is weighted 
with an appropriate w. The sum ofthe weighted inputs and the bias forms the 
input to the transfer fanction 上 Neurons may use any differentiable transfer 

fonction 太 to generate their output. 


Input ”General Neuron 


Where.… 


RR=Number of 
elements in 
input vector 





= 作 Wp+D) 


Multilayer networks often use the log-sigmoid transfer function 1ogsig. 





Q = logsig(1) 
Log-Sigmoid Transfer Function 


Fundamentals 





The function 1ogsig generates outputs between 0 and 1 as the neuron's net 
input goes from negative to positive infinity. 


Alternatively, multilayer networks may use the tan-sigmoid transfer function 
tansig. 





QG = 1a118ig(71) 


Tan-Sigmoid Transfer Function 


Occasionally, the linear transfer function purelin is used in backpropagation 
Detworks， 





QG = DUFeL7(I) 


Linear Transfer Function 


Itthe last layer ofa multilayer network has sigmoid neurons, then the outputs 
of the network are limited to a small range. Iflinear output neurons are used 
the network outputs can take on any value. 


In backpropagation it is important to be able to calculate the derivatives ofany 
transfer functions used. 了 ach ofthe transfer functions above, tansig, 10gsig， 
and purelin,have acorresponding derivative function: dtansig, dlogsig, and 
dpurelin. To get the name of a transfer function's associated derivative 
foanction, call the transfer function with the string 'deriv '. 


tansig( deriv ) 
ans = dtansig 
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The three transfer functions described here are the most commonly used 
transfer functiongs for backpropagation, but other differentiable transfer 
fonctions can be created and used with backpropagation if desired. See 
“Advanced Topics”in Chapter 12.? 


Feedforward Network 


Asgsingle-layer network ofS 1ogsig neurons having 民 inputs is shown below in 
如 ]1 detail on the left and with a layer diagram on the right. 


Input Layer of Neurons Input Layer of Neurons 






a=f(Wp+b) 


Where.. 尺 =numberof 
elements in 
input vector 


=numberof 
a=f(Wp+b) neurons in layer 


Feedforward networks often have one or more hidden layers of sigmoid 
neurons followed by an output layer of lnear neurons. Multiple layers of 
neurons with nonlinear transfer functiongs allow the network to learn nonlinea7 
and linear relationships between input and output vectors. The linear output 
layer lets the network produce values outside the range -1 to +1. 


On the other hand, ifyou want to constrain the outputs of a network (Such as 
between 0 and 1),then the output layer should use a sigmoid transfer fanction 
(Such as 10ogsig). 


As noted in Chapter 2, for multiple-layer networks we use the number of the 
layers to determine the superscript on the weight matrices. The appropriate 
notation is used in the two-layer tansig/purelin network shown next. 
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Input Hidden Layer Output Layer 





al = tansig (IVW1Ip1 +by) a2 =purelin (LVWV21al+b2) 


This network can be used as a general function approximator. It can 
approximate any fanction with a finite number of discontinuities, arbitrarily 
well, given sufficient neurongs in the hidden layer. 


Creating a Network (newffl，The first step in training a feedqforward network is to 
create the network object. The function newff creates a feedforward network. 
It requires four inputs and returns the network object. The first input is an 及 
by2matrix ofminimum and maximum values for each oftheRelements ofthe 
input vector. The second input is an array containing the sizes of each layer. 
The third input is a cell array containing thenames ofthe transfer functiongs to 
be used in each layer. The final input contains the name ofthe training 
fanction to be used. 


For example, the following command creates atwo-layer network. There is one 
input vector with two elements. The values for the first element ofthe input 
vector range between -1 and 2, the values of the second element of the input 
Vector range between 0 and 5. There are three neurons in the first layer and 
one neuron in the second (output) layer. The transfer function in the first layer 
is tan-sigmoid, and the output layer transfer function is linear. The training 
function is traingd (which is qdqescribed in a later Section). 


net=newff([-1 2; 0 5],[3,1],{f tansig ，purelin' +，traingd ' ) ; 
This command creates the network object and also initializes the weights and 


biases of the network; therefore the network is ready for training. There are 
times when you may want to reinitialize the weights, or to perform a custom 


initialization. The next section explaings the details ofthe initialization proceSssS. 


Inifializing Weights (in 刘 ，Before training a feedqforward network, the weights and 
biases mustbe initialized. The newff command will automatically initialize the 
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weights, but you may want to reinitialize them. This can be done with the 
command init. This fanction takes a network object as input and returns a 
network object with all weights and biases initialized. Here is how a network 
is initialized (or reinitialized): 


net = Init(net) ; 


For specifics on how the weights are initialized, see Chapter 12. 


Simulaftion (sim) 
The function sim simnulates a network. sim takes the network input p, and the 


network object net, and returns the network outputs a. Here is how you can use 
simto simulate the network we created above for a single input vector: 


p= [1;2]; 
a= Sim(net,p) 
aa 三 

-0.1011 


(fyou try these commands, your output may be different, depending on the 
state of your random number generator when the network was initialized.) 
Below, simis called to calculate the outputs for a concurrent set of three input 
vectors. This is the batch mode form of simulation, in which all ofthe input 
vectors are place in one matrix. This is much more efficient than presenting the 
vectors one at a time. 


p= [132;241]) 
a=Sim(net,p) 
Q 

-0.1011 -0.2308 0.4955 


Training 

Once the network weights and biases have been initialized, the network is 
ready for training. The network can be trained for function approximation 
(nonlinear regression), pattern association, or pattern classification. The 
training process requires a set of exzamples of proper network behavior - 
network inputs p and target outputs t. During training the weights and biases 
of the network are iteratively adjusted to minimize the network performance 
fonction net.performFcn. The default performance function for feedforward 
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networks is mean Square error mse - _ the average squared error between the 
network outputs a and the target outputs 七 . 


The remainder of this chapter describes several different training algorithms 
for feedforward networks. All ofthese algorithms use the gradient ofthe 
performance fanction to determine how to adqjust the weights to minimize 
performance. The gradient is determined using a technique called 
backpropagation, which involves performing computations backwards through 
the network. The backpropagation computation is derived using the chain rule 
of calculus and is described in Chapter 11 of [HDB96|]. 


The basic backpropagation training algorithm, in which the weights are moved 
in the direction ofthe negative gradient, is described in the next section. Later 
sections describe more complex algorithms that increase the speed of 
convergence, 


Backpropagation Algorithm 


There are many variations ofthe backpropagation algorithm, several of which 
we discuss in this chapter. The simplest implementation of backpropagation 
learning updates the network weights and biases in the direction in which the 
performance function decreases most rapidly - the negative ofthe gradient. 
One iteration of this algorithm can be written 


XR+I 二 XR 一 CRS 


Where Xp is8 avector of current weights and biases,，g is the current gradient， 
and oj is the learning rate. 


There are two different ways in which this gradient descent algorithm can be 
imnplemented: incremental mode and batch mode. In the incremental mode, the 
gradient is computed and the weights are updated after each input is applied 
to the network. In the batch mode all ofthe inputs are applied to the network 
before the weights are updated. The next section describes the batch mode of 
training; incremental training will be discussed in a later chapter. 


Batch Training (train)，In batch mode the weights and biases of the network are 
updated only after the entire training set has been applied to the network. The 
gradients calculated at each training example are added together to determine 
the change in the weights and biases. For a discussion of batch training with 
the backpropagation algorithm see page 12-7 of IHDB96]. 


5 Bockpropagation 





5-10 


Batch Gradient Descent (traingd)，The batch steepest descent training function is 
traingd. The weights and biases are updated in the direction ofthe negative 
gradient ofthe performance function. If you want to train a network using 
batch steepest descent, you should set the network trainFcn to traingd, and 
then call the function train. There is only one training function associated 
with a given network. 


There are seven training parameters associated with traingd: epochs, show， 
goal, time, min_ grad, max_ fail, and Lr. The learning rate Lr is multiplied 
times the negative ofthe gradient to determine the changes to the weights and 
biases. The larger the learning rate, the bigger the step. Ifthe learning rate is 
made too large, the algorithm becomes unstable. Ifthe learning rate is set too 
small, the algorithm takes a long time to converge. See page 12-8 of [IHDB96] 
for a discussion ofthe choice of learning rate. 


The training status is displayed for every show iteration ofthe algorithm. (If 
show is set to NaN, then the training status never displays.) The other 
parameters determine when the training stops. The training stops 这 the 
number of iterations exceeds epochs, 让 the performance fanction drops below 
goal, ifthe magnitude ofthe gradient is less than mingrad, or 让 the training 
time is longer than time seconds. We discuss max_fai1, which is associated 
with the early stopping technique, in the section on improving generalization. 


The following code creates a training set ofinputs p and targets t. For batch 
training, all ofthe input vectors are placed in one matrix. 
p=[-1-122;0505]; 
t=[-1-111]; 


Next, we create the feedforward network. Here we use the function minmax to 

determine the range ofthe inputs to be used in creating the network. 
net=newff (minmax(p),[3,1],{tansig ，purelin' }，traingd ) ; 

Atthis point, we might wantto modify some ofthe default training parameters. 


net.trainParam.Show = 50; 
net.trainParam.lLIr = 0.05; 
net.trainParam.epochs = 300; 
net.trainParam.goal = 1e-5; 


HIyou want to use the default training parameters, the above commands are 
not neceSSary. 
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Now we are ready to train the networkK. 


[net,tr]=train(net,p,t) ; 

TRAINGD ，Epoch 0/300，MSE 1.59423/1e-05，Gradient 2.76799/ 
1e-10 

TRAINGD ，Epoch 50/300，MSE 0.00236382/11e-05，Gradient 
0.0495292/11e-10 

TRAINGD ，Epoch 100/300，MSE 0.000435947/1e-05，Gradient 
0.0161202/1e-10 

TRAINGD ，Epoch 150/300，MSE 8.68462e-05/1e-05，Gradient 
0.00769588/1e-10 

TRAINGD ，Epoch 200/300，MSE 1.45042e-05/1e-05，Gradient 
0.00325667/1e-10 

TRAINGD ，Epoch 211/300，MSE 9.64816e-06/1e-05，Gradient 
0.00266775/1e-10 

TRAINGD ，Performance goal met ， 


The training record tr contaings information about the progress oftraining. An 
example ofits use is given in the Sample Training Session near the end of this 
chapter. 


Now the trained network can be simulated to obtain its respongse to the inputs 
in the training set. 


a= Simn(net,p) 
忆 As 
-1.0010 -0.9989 1.0018 0.9985 


Try the Neural Network Design Demonstration nnd12sd1[IHDB96] for an 
illustration of the performance of the batch gradient descent algorithm 


Batch Gradient Descent with Momentum (traingdm). In addqdition to traingd, there is 
another batch algorithm for feedforward networks that often provides faster 
convergence - traingdm, steepest descent with momentum. Momentum allows 
anetwork to respond not only to the local gradient, but also to recent trends in 
the error Surface. Acting like a low-pass filter, momentum allows the network 
to ignore small features in the error Surface. Without momentum a network 
may get stuck in a shallow local minimum. With momentum a network can 
slide through such a minimum. See page 12-9 of [IHDB96] for a discussion of 
momentunm. 
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Momentum can be added to backpropagation learning by making weight 
changes equal to the sum of a fraction of the last weight change and the new 
change suggested by the backpropagation rule. The magnitude of the effect 
that the last weight change is allowed to have is mediated by a momentum 
congstant, mc, which can be any number between 0 and 1. When the momentum 
congstant is 0, a weight change is based solely on the gradient. When the 
momentum constant is 1, the new weight change is set to equal the last weight 
change and the gradient is simply ignored. The gradient is computed by 
summing the gradients calculated at each training example, and the weights 
and biases are only updated after all training examples have been presented. 


Ifthe new performance function on a given iteration exceeds the performance 
fonction on a previous iteration by more than a predefined ratio max_perf_inc 
(typically 1.04), the new weights and biases are discarded, and the momentum 
coefficient mc is set to Zero 


The batch form of gradient descent with momentum is invoked using the 
training function traingdm. The traingdm function is invoked using the same 
steps shown above for the traingd function, except that the mc, Lr and 
max_perf_inc learning parameters can all be set. 


In the following code we recreate our previous network and retrain it using 
gradient descent with momentum. The training parameters for traingdm are 
the same as those for traingd, with the addition ofthe momentunm factor mc 
and the maximum performance increase max_perf_inc. (The training 
parameters are reset to the default values whenever net.trainFcn is set to 
traingdm.) 


p=[-1-122;0505]; 
t= [-1-111]| 
net=newff (minmax(p),[3,1],{f tansig ，purelin' }，traingdm ' ) ; 
net.trainParam.Show = 50 
net.trainParam.lLIr = 0.05; 
net.trainParam.mc = 0.9; 
net.trainParam.epochs = 300; 
net.trainParam.goal = 1e-5; 
[net,tr]=train(net,pt); 
TRAINGDM，Epoch 0/300，MSE 3.6913/1e-05，Gradient 4.54729/ 
1e-10 
TRAINGDM，Epoch 50/300，MSE 0.00532188/1e-05，Gradient 
0.213222/1e-10 
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TRAINGDM，Epoch 100/300，MSE 6.34868e-05/1e-05，Gradient 
0.0409749/1e-10 

TRAINGDM，Epoch 114/300，MSE 9.06235e-06/1e-05，Gradient 
0.00908756/1e-10 
TRAINGDM，Performance goal met . 
= Slim(net,p) 


-1.0026 -1.0044 0.9969 0.9992 


Note that since we reinitialized the weights and biases before training (by 
calling newff again), weobtain adifferent mean square error than we didusing 
traingd. If we were to reinitialize and train again using traingdm, we would 
get yet a different mean square error. The random choice of initial weights and 
biases will affect the performance ofthe algorithm. Ifyou want to compare the 
performance of different algorithms, you should test each using several 
different sets of initial weights and biases. You may want to use 
net=init(net) to reinitialize the weights, rather than recreating the entire 
Detwork with newff. 


Try the Neural Network Design Demonstration nnd12mo [HDB96] for an 
ilustration of the performance of the batch momentum algorithm 
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Fastfer Training 


The previous section presented two backpropagation training algorithms: 
gradient descent, and gradient descent with momentum. These two methods 
are often too slow for practical problems. In this section we discuss several high 
performance algorithms that can converge 位 om ten to one hundred times 
faster than the algorithms discussed previously. All of the algorithms in thigs 
section operate in the batch mode and are invoked using train， 


These faster algorithms fall into two main categories. The first category usesg 
heuristic techniques, which were developed from an analysis of the 
performance of the standard steepest descent algorithm. One heuristic 
modification is the momentum technique, which was presented in the previous 
section.This section discusses two more heuristictechniques: variablelearning 
rate backpropagation, traingda; and resilient backpropagation trainrp. 


The second category of fast algorithms uses standard numerical optimization 
techniques. (See Chapter 9 of [HDB96] for a review of basic numerical 
optimization.) Later in this section we present three types of numerical 
optimization techniques for neural network training: conjugate gradient 
(traincgf, traincgp, traincgb, trainscgj, quasi-Newton (trainbfg， 
trainoss), and Levenberg-Marquardt (trainlm). 


Variable Learning Rate (fraingda, traingdx) 


With standard steepest descent, the learning rate is held constant throughout 
training. The performance ofthe algorithm is very sengsitive to the proper 
setting of the learning rate. Ifthe learning rate is set too high, the algorithm 
may oscillate and become unstable. Ifthe learning rate is too small, the 
algorithm will take too long to converge. It is not practical to determine the 
optimal setting for the learning rate before training, and, in fact, the optimal 
learning rate changegs during the training process, as the algorithm movesgs 
acrogss the performance surface. 


The performance ofthe steepest descent algorithm can be improved 这 we allow 
the learning rate to change during the training process. An adaptive learning 
rate will attempt to keep the learning step size as large as possible while 
keeping learning stable.Thelearningrate is made responsive to the complexity 
of the local error Surface. 


An adaptive learning rate requires some changes in the training procedure 
used by traingd. First, the initial network output and error are calculated. At 
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each epoch new weights and biases are calculated using the current learning 
rate. New outputs and errors are then calculated. 


As with momentum, ifthe new error exceeds the old error by more than a 
predefined ratio max_perf_inc (typically 1.04), the new weights and biases are 
discarded. In addition, the learning rate is decreased (typically by multiplying 
by LIr_ dec = 0.7). Otherwise, the new weights, etc., are kept. Ithe new erTror is 
less than the old error, the learning rate is increased (typically by multiplying 
by Lr_inc = 1.05). 


This procedure increases the learning rate, but only to the extent that the 
network can learn without large error increases. Thus, anear-optimal learning 
rate is obtained for the local terrain. When a larger learning rate could result 
in stable learning, the learning rate is increased. When the learning rate is too 
high to guarantee a decrease in error, it gets decreased until stable learning 
Tesumes. 


Try the Neural Network Design Demonstration nnd12v1 [HDB96] for an 
illustration of the performance ofthe variable learning rate algorithm. 


Backpropagation training with an adaptive learning rate is implemented with 
the fanction traingda, which is called just like traingd, except for the 
additional training parameters max_perf_inc, LIr_dec, and Lr_inc. Here is 
how it is called to train our previous two-layer network: 


p=[-1-122;0505]; 
t= [-1-111]， 
net=newff(minmax(p),[3,1],{f tansig ，purelin' +，traingda ' ) ; 
net .trainParam.Sshow = 50 
net .trainParam.lLr = 0.05; 
net .trainParam.lIr_ inc = 1.05j 
net.trainParam.epochs = 300; 
net.trainParam.goal = 1e-5; 
[net,tr]=train(net,p,t) ; 
TRAINGDA，Epoch 0/300，MSE 1.71149/1e-05，Gradient 2.6397/ 
1e-06 
TRAINGDA，Epoch 44/300，MSE 7.47952e-06/1e-05，Gradient 
0.00251265/1e-06 
TRAINGDA，Performance goal met . 
Slim(net,p) 


史册 
中 


-1.0036 -0.9960 1.0008 0.9991 
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The function traingdx combines adaptive learning rate with momentum 
training. It is invoked in the same way as traingda, except that it has the 
momentum coe 值 cient mc as an additional training parameter. 


Resilient Backpropagaftion (frainrp) 


Mujltilayer networks typically use sigmoid trangsfer fanctions in the hiddqen 
layers. These functions are often called“squashing”functions, since they 
compress an infinite input range into a finite output range. Sigmoid functions 
are characterized by the fact that their slope must approach zero as the input 
gets large. This causes a problem when using steepest descent to train a 
multilayer network with sigmoid functions, since the gradient can have a veTy 
small magnitude; and therefore, cause small changes in the weights and 
biases, even though the weights and biases are far fom their optimal values. 


The purpose of the resilient backpropagation (Rprop) training algorithm is to 
eliminate these harmful effects of the magnitudes ofthe partial derivatives， 
Only the sign ofthe derivative is used to determine the direction ofthe weight 
update; the magnitude ofthe derivative has no effect on the weight update. The 
Size ofthe weight change is determined byaseparate update value. The update 
value for each weight and bias is increased by a factor delt_inc whenever the 
derivative ofthe performance function with respect to that weight has the 
Same Sign for two Successive iterations. The update value is decreased by a 
factor delt_dec wheneverthe derivative with respect that weight changes sign 
位 om the previous iteration. Hthe derivative is zero, then the update value 
remains the same. Whenever the weights are oscillating the weightchange will 
be reduced. Ifthe weight continues to change in the same direction for several 
iterations, then the magnitude ofthe weight change will be increased. A 
complete description ofthe Rprop algorithm is given in [ReBr93|]. 


In the following code we recreate our previous network and train it using the 
Rprop algorithm. The training parameters for trainrp are epochs, show, goal1， 
time, min grad, max fail, delt_inc, delt _ dec, delta0, deltamax. We have 
previously discussed thefirst eight parameters. The lasttwoare theinitial step 
size and the maximum step size, respectively. The performance ofRprop is not 
very sensitive to the settings ofthe training parameters. For the example 
below, we leave most of the training parameters at the default values. We do 
reduce show below our previous value, because Rprop generally converges 
much faster than the previous algorithms. 


p=[-1-122;0505]; 
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t= [-1-111]| 
net=newff(minmax(p),[3,1],{tansig ，purelin' }，trainrp ) ; 
net .trainParam.Sshow = 10; 
net.trainParam.epochs = 300; 
net.trainParam.goal = 1e-5; 
[net,tr]=train(net,p,t) ; 

TRAINRP，Epoch 0/300，MSE 0.469151/1e-05，Gradient 1.4258/ 
1e-06 

TRAINRP，Epoch 10/300，MSE 0.000789506/1e-05，Gradient 
0.0554529/1e-06 

TRAINRP，Epoch 20/300，MSE 7.13065e-06/1e-05，Gradient 
0.00346986/1e-06 
TRAINRP，Performance goal met . 
= Slim(net,p) 


-1.0026 -0.9963 0.9978 1.0017 


Rprop is generally much faster than the standard steepest descent algorithm. 
It also has the nice property that it requires only a modest increase in memory 
requirements. We do need to store the update values for each weight and bias， 
which is equivalent to storage of the gradient. 


Coniugate Gradient Algorithms 


The basic backpropagation algorithm adjusts the weights in the steepest 
descent direction (negative of the gradient). This is the direction in which the 
performance function is decreasing most rapidqly. It turns out that, although 
the function decreases most rapidly along the negative ofthe gradient, thigs 
does not necessarily produce the fastest convergence. In the conjugate gradient 
algorithms a search is performed along conjugate directions, which produces 
generally faster convergence than steepest descent directions. In this section， 
we present four different variations of conjugate gradient algorithms. 


See page 12-14 of [IHDB96] for a discussion of conjugate gradient algorithms 
and their application to neural networks. 


In most ofthetraining algorithms that we discussed up to this point, a learning 
rate is used to determine the length ofthe weight update (step size). In most of 
the conjugate gradient algorithms,the step size is adqjusted at each iteration. A 
search is made along the conjugate gradient direction to determine the step 

Size, Which minimizes the performance function along that line. There are five 
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different search functions included in the toolbox, and these are discussed at 
the end of this section. Any ofthese search functions can be used 
interchangeably with a variety of the training functions qdqescribed in the 
remainder of this chapter. Some search functiongs are best sujited to certain 
training fanctions, although the optimum choice can vary according to the 
Specific application. An appropriate default search function is assigned to each 
training function, but this can be modified by the userT. 


Fletfcher-Reeves Update (traincgf) 


All ofthe conjugate gradient algorithms start out by searching in the steepegst 
descent direction (negative ofthe gradient) on the first iteration . 


po = -80 


Aline search is then performed to determine the optimal distance to move 
along the current search direction: 


XR+1 二 XP 十 OP 


Then the next search direction is determined so that it is conjugate to previous 
search directions. The general procedure for determining the new search 
direction is to combine the new steepest descent direction with the previous 
Search direction: 


pz = 一 多 +Bzpp_ 1 


The various versions of conjugate gradient are distinguished by the manner in 
which the constant B& is computed. For the Fletcher-Reeves update the 
procedure is 


gx g 
尼 
P， 全 < 


克 
8 -188 -1 


This is the ratio ofthe norm squared of the current gradient to the norm 
squared of the previous gradient. 


See [FI]Re64] or [HDB96] for a discussion of the Fletcher-Reeves conjugate 
gradient algorithm. 


In the following code, we reinitialize our previous network and retrain 让 using 
the Fletcher-Reeves version of the conjugate gradient algorithm. The training 
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parameters for traincgf are epochs, show, goal，time, min_ grad, max_fail1， 
srchFcn, scal_ tol, alpha, beta, delta, gama, Low_Lim, up_Jim, maxSstep， 
minstep, bmax. We have previously discussed the first six parameters. The 
parameter srchFcn is the name ofthe line search function. It can be any ofthe 
fanctions described later in this section (or a user-sSupplied function). The 
remaining parameters are associated with specific line search routines and are 
described later in this section. The default line search routine srchcha is used 
in this example. traincgf generally converges in fewer iterations than 
trainrp (although there is more computation required in each iteration). 


p=[-1-122;0505]; 
t= [-1-111]) 
net=newff(minmax(p),[3,1],{f tansig ，purelin' +，traincgf  ) ; 
net .trainParam.Show = 5; 
net.trainParam.epochs = 300; 
net.trainParam.goal = 1e-5; 
[net,tr]=train(net,p,t) ; 

TRAINCGF -Srchcha，Epoch 0/300，MSE 2.15911/1e-05，Gradient 
3.17681/1e-06 

TRAINCGF -Srchcha，Epoch 5/300，MSE 0.111081/1e-05，Gradient 
0.602109/1e-06 

TRAINCGF -srchcha，Epoch 10/300，MSE 0.0095015/1e-05，Gradient 
0.197436/1e-06 

TRAINCGF -srchcha，Epoch 15/300，MSE 0.000508668/1e-05， 
Gradient 0.0439273/1e-06 

TRAINCGF -Srchcha，Epoch 17/300，MSE 1.33611e-06/1e-05， 
Gradient 0.00562836/1e-06 

TRAINCGF ，Performance goal met . 
a= Simn(net,p) 
昌 = 

-1.0001 -1.0023 0.9999 1.0002 


The conjugate gradient algorithms are usually much faster than variable 
learning rate backpropagation, and are sometimes faster than trainrp， 
although the results will vary 位 om one problem to another. The conjugate 
gradient algorithms require only a little more storage than the simpler 
algorithms, so they are often a good choice for networks with alargenumber of 
WwWeights. 


Try the Neural Network Design Demonstration nnd12cg [HDB96] for an 
illustration of the performance of a conjugate gradient algorithm . 
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Polak-Ribi6re Update (traincgp) 

Another version of the conjugate gradient algorithm was proposed by Polak 
and Ribikre. As with the Fletcher-Reeves algorithm, the search direction at 
each iteration is determined by 


pz = 一 色 +PBzpp_ 1 


For the Polak-Ribikre update, the constant B is computed by 


AgA_ 1 人 
pp 一 2 
28-18 -1 
This is the inner product of the previous change in the gradient with the 
current gradient divided by the norm squared ofthe previous gradient. See 
[FIRe64] or [HDB96] for a qiscussion ofthe Polak-Ribikre conjugate gradient 
algorithm. 


In the following code, we recreate our previous network and train it using the 
Polak-Ribikre version of the conjugate gradient algorithm. The training 
parameters for traincgp are the same as those for traincgf. The default line 
search routine srchcha is used in this exzample. The parameters show and 
epoch are set to the same values as they were for traincgf. 


p=[-1-122;0505]; 
t=[-1-111]; 
net=newff (minmax(p),[3,1],{f tansig ，purelin' }，traincgp ) ; 
net.trainParam.Show = 5; 
net.trainParam.epochs = 300; 
net.trainParam.goal = 1e-5; 
[net,tr]=train(net,pt) ; 

TRAINCGP -srchcha，Epoch 0/300，MSE 1.21966/1e-05，Gradient 
1.77008/1e-06 

TRAINCGP -srchcha，Epoch 5/300，MSE 0.227447/1e-05，Gradient 
0.86507/1e-06 

TRAINCGP -srchcha，Epoch 10/300，MSE 0.000237395/1e-05， 
Gradient 0.0174276/1e-06 

TRAINCGP -srchcha，Epoch 15/300，MSE 9.28243e-05/1e-05， 
Gradient 0.00485746/1e-06 

TRAINCGP -srchcha，Epoch 20/300，MSE 1.46146e-05/1e-05， 
Gradient 0.000912838/11e-06 
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TRAINCGP -srchcha，Epoch 25/300，MSE 1.05893e-05/1e-05， 
Gradient 0.00238173/1e-06 

TRAINCGP -srchcha，Epoch 26/300，MSE 9.10561e-06/1e-05， 
Gradient 0.00197441/1e-06 
TRAINCGP，Performance goal met . 
= Slim(net,p) 


-0.9967 -1.0018 0.9958 1.0022 


The traincgp routine has performance similar to traincgf. It is qifGicult to 
predict which algorithm will perform best on a given problem. The storage 
requirements for Polak-Ribikre (four vectors) are slightly larger than for 
Fletcher-Reeves (three vectors). 


Powell-Beale Restarts (traincgb) 

For all conjugate gradient algorithms, the search direction will be periodically 
reset to the negative ofthe gradient. The standard reset point occurs when the 
number of iterationgs is equal to the number ofnetwork parameters (weights 
and biases), but there are other reset methods that can improve the e 仁 ciency 
of training. One such reset method was proposed by Powell [Powe77], based on 
an earlier version proposed by Beale [Beal72]. For this technique we will 
restart ifthere is very little orthogonality left between the current gradient and 
the previous gradient. This is tested with the following inequality. 


全 2 
可 -人 > 02|g 
Ifthis condition is satisfied, the search direction is reset to the negative of the 
gradient. 


In the following code, we recreate our previous network and train it using the 
Powell-Beale version ofthe conjugate gradient algorithm. The training 
parameters for traincgb are the same as those for traincgf. The default line 
Search routine srchcha is used in this example. The parameters show and 
epoch are set to the same values as they were for traincgf. 


p [-1 -122;0505]; 

t= [-1-111]| 

net=newff(minmax(p),[3,1],{f tansig ，purelin' +，traincgb  ) ; 
net .trainParam.Show = 5; 

net.trainParam.epochs = 300; 

net.trainParam.goal = 1e-5; 
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[net,tr]=train(net,pt) ; 

TRAINCGB -srchcha，Epoch 0/300，MSE 2.5245/1e-05，Gradient 
3.66882/11e-06 

TRAINCGB -srchcha，Epoch 5/300，MSE 4.86255e-07/1e-05，Gradient 
0.00145878/1e-06 

TRAINCGB ，Performance goal met ， 
a= Sim(net,p) 


-0.9997 -0.9998 1.0000 1.0014 


The traincgb routine has performance that is somewhat better than traincgp 
for some problems, although performance on any given problem is difEicult to 
predict. The storage requirements for the Powell-Beale algorithm (six vectors) 
are slightly larger than for Polak-Ribikre (four vectors). 


Scaled Conjugate Gradient (trainscg) 

卫 ach of the conjugate gradient algorithms that we have discussed so far 
requires a line search at each iteration. This line search is computationally 
expensive, since it requires that the network response to all training inputs be 
computed several times for each search. The scaled conjugate gradient 
algorithm (SCG), developed by Moller [Mol193], was designed to avoid the 
time-consuming line search. This algorithm is too complex to explain in a few 
lines, but the basic idea is to combine the model-trust region approach (used in 
the Levenberg-Marquardt algorithm described later), with the conjugate 
gradient approach. See {Moll93] for a detailed explanation ofthe algorithm. 


In the following code, we reinitialize our previous network and retrain 让 using 
the scaled conjugate gradient algorithm. Thetraining parameters fortrainscg 
are epochs, show, goal, time, min_grad, max_fail, sigma, lambda. We have 
previously discussedthe first sixXparameters. Theparameter Sigma determines 
the change in the weight for the second derivative approximation. The 
parameter lambda regulates the indefiniteness ofthe Hessian. The parameters 
show and epoch are set to 10 and 300, respectively. 


p=[-1-122;0505]; 

t= [-1-111]| 

net=newff (minmax(p),[3,1],{f tansig ，purelin' }，trainscg ) ; 
net.trainParam.Sshow = 10; 

net.trainParam.epochs = 300; 

net.trainParam.goal = 1e-5; 

[net,tr]=train(net,p,t) ; 
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TRAINSCG，Epoch 0/300，MSE 4.17697/1e-05，Gradient 5.32455/ 
1e-06 

TRAINSCG，Epoch 10/300，MSE 2.09505e-05/1e-05，Gradient 
0.00673703/1e-06 

TRAINSCG，Epoch 11/300，MSE 9.38923e-06/1e-05，Gradient 
0.0049926/1e-06 
TRAINSCG，Performance goal met . 
= Slim(net,p) 


-1.0057 -1.0008 1.0019 1.0005 


The trainscg routine may require more iterations to converge than the other 
conjugate gradient algorithms, but the number of computations in each 
iteration is significantly reduced because no line search is performed. The 
storage requirements forthe scaled conjugate gradient algorithm are about the 
same as those of Fletcher-Reeves. 


Line Search Roufines 


Several of the conjugate gradient and quasi-Newton algorithms require that a 
line search be performed. In this section, we describe five different line 
Searches you can use. To use any of these search routines, you Simply set the 
training parameter srchFcn equal to the name of the desired search function， 
as described in previous sectiongs. It is often difficult to predict which ofthese 
routines provide the best results for any given problem, but we set the default 
search function to an appropriate initial choice for each training function, so 
you never need to modify this parameter. 


Golden Secfion Search (srchgol) 


The golden section search srchgol is a linear search that does not require the 
calculation ofthe slope. This routine begins by locating an interval in which the 
minimum ofthe performance occurs. This is accomplished by evaluating the 
performance at a sequence of points, starting at a distance of delta and 
doubling in distance each step, along the search direction. When the 
performance increases between two successive iterations, a minimum has been 
bracketed. The next step is to reduce the size ofthe interval containing the 
minimum. Two new points are located within the initial interval. The values of 
the performance at these two points determines a section of the interval that 
can be discarded, and a new interior point is placed within the new interval. 
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This procedure is continued until the interval of uncertainty is reduced to a 
width of to1l, which is equal to delta/scale_tol. 


See [HDB96], starting on page 12-16, for a complete description of the golden 
Section Search. Try the Neural Network Design Demonstration nnd12sd1l 
[HDB96] for an illustration ofthe performance of the goldqen section search in 
combination with a conjugate gradient algorithm 


Brent's Search (srchbre) 


Brent's search is a linear search, which is a hybrid combination ofthe golden 
section search and a quadratic interpolation. Function comparison methods， 
like the golden section search, have a first-order rate of convergence, while 
polynomial interpolation methods have an asymptotic rate that is faster than 
Superlinear. On the other hand, the rate of convergence for the golden section 
search starts when the algorithm is initialized, whereas the asymptotic 
behavior for the polynomial interpolation methods may take many iterations 
to become apparent. Brent?s search attempts to combine the best features of 
both approaches. 


For Brent's search we begin with the same interval ofuncertainty that weused 
with the golden section search, but some additional points are computed. A 
quadratic fanction is then fitted to these points and the minimum ofthe 
quadratic function is computed. Ifthis minimum is within the appropriate 
interval of uncertainty, it is used in the next stage of the search and a new 
quadratic approximation is performed. Ifthe minimum falls outside the known 
interval of uncertainty, then a step of the golden section search is performed. 


See [Bren73] for a complete description ofthis algorithm. This algorithm has 
the advantage that it does not require computation ofthe derivative. The 
derivative computation requires a backpropagation through the network， 
which involves more computation than a forward pass. However, the algorithm 
may require more performance evaluations than algorithms that use 
derivative information. 


Hybrid Bisection-Cubic Search (srchhyb) 


Like Brent's search, srchhyb is a hybrid algorithm. It is a combination of 
bisection and cubic interpolation. For the bisection algorithm, one point is 
located in the interval of uncertainty and the performance and its derivative 
are computed. Based on this information, halfofthe interval of uncertainty is 
discarded. In the hybrid algorithm, a cubic interpolation ofthe function is 
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obtained by using the value ofthe performance and its derivative at the two 
end points. Hthe minimum ofthe cubic interpolation falls within the known 
interval of uncertainty, then it is used to reduce the interval of uncertainty. 

Otherwise, a step of the bisection algorithm is used. 


See [Scal85] for a complete description of the hybrid bisection-cubic search. 
This algorithm does require derivative information, so it performs more 
computationgs at each step of the algorithm than the golden section Search or 
Brents algorithm. 


Charalambous' Search (srchcha) 


The method ofCharalambous srchcha was designed to be used in combination 
with a conjugate gradient algorithm for neural network training. Like the 
previous two methods, it is a hybrid search. It uses a cubic interpolation， 
together with a type of sectioning. 


See [Char92] for a description of Charalambous: search. We have used this 
routine as the default search for most of the conjugate gradient algorithms， 
Since it appears to produce excellent results for many different problems. It 
does require the computation of the derivatives (backpropagation) in addition 
to the computation ofperformance, but it overcomes this limitation by locating 
the minimum with fewer steps. This is not true for all problems, and you may 
want to experiment with other line searches. 


Backtracking (srchbac) 


The backtracking search routine srchbac is best suited to use with the 
quasi-Newton optimization algorithms. Itbegins with astep multiplier ofl and 
then backtracks until an acceptable reduction in the performance is obtained. 
On the first step it uses the value of performance at the current point and at a 
step multiplier of 1. Also it uses the value ofthe derivative of performance at 
the current point, to obtain aquadratic approximation to the performance 
fanction along the search direction. The minimum ofthe quadratic 
approximation becomes a tentative optimum point (under certain conditions) 
and the performance at this point is tested. Ifthe performance is not 
sufficiently reduced, a cubic interpolation is obtained and the minimum ofthe 
cubic interpolation becomes the nevw tentative optimum point. This procegss ig 
continued until a suffGicient reduction in the performance is obtained. 
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The backtracking algorithm is described in [DeSc83]. It was used as the default 
line search for the quasi-Newton algorithms, although it may not be the best 
technique for all problems. 


Quasi-Newton Algorithms 


BFGS Algorithm (trainb 纠 
Newton's method is an alternative to the conjugate gradient methods for fast 
optimization. The basic step of Newton”s method is 


-1 
x+1 = XR 一 Ab gp 


where Ab is the Hessian matrix (Second derivatives) ofthe performance index 
at the current values of the weights and biases. Newton's method often 
converges faster than conjugate gradient methods. Unfortunately, it is complex 
and expensive to compute the Hessian matrix for feedforward neural networks. 
There is a class of algorithms that is based on Newton's method, but which 
doesn't require calculation of second derivatives. These are called 
quasi-Newton (or secant) methods. They update an approximate Hessian 
matrix at each iteration ofthe algorithm.Theupdate is computed as a function 
of the gradient. The quasi-Newton method that has been most successful in 
published studies is the Broyden, Fletcher, Goldfarb, and Shanno (BFGS) 
update. This algorithm has been implemented in the trainbfg routine， 


In the following code, we reinitialize our previous network and retrain 让 using 
the BFGS quasi-Newton algorithm. The training parameters for trainbfg are 
the same as those for traincgf.The dqefault line search routine srchbac is used 
in this example. The parameters show and epoch are set to 5 and 300， 
Tespectively. 


p=[-1-122;0505]; 
t= [-1-111]| 
net=newff (minmax(p),[3,1],{f tansig ，purelin' }，trainbfg ) ; 
net.trainParam.Show = 5; 
net.trainParam.epochs = 300; 
net.trainParam.goal = 1e-5; 
[net,tr]=train(net,p,t) ; 
TRAINBFG-Ssrchbac，Epoch 0/300，MSE 0.492231/1e-05，Gradient 
2.16307/11e-06 
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TRAINBFG-Ssrchbac，Epoch 5/300，MSE 0.000744953/11e-05，Gradient 
0.0196826/1e-06 

TRAINBFG-Ssrchbac，Epoch 8/300，MSE 7.69867e-06/1e-05，Gradient 
0.00497404/1e-06 
TRAINBFG，Performance goal met . 
= Slim(net,p) 


-0.9995 -1.0004 1.0008 0.9945 


The BFGS algorithm is described in [DeSc83]. This algorithm requires more 
computation in each iteration and more storage than the conjugate gradient 
methods, although it generally converges in fewer iterations. The approximate 
Hessian must be stored, and its dimension is 刀 X 有 ,where 7 is equal to the 
number of weights and biases in the network. For very large networks it may 
be better to use Rprop or one ofthe conjugate gradient algorithms. For smaller 
networks, however, trainbfg can be an efficient training function. 


One Step Secant Algorithm (trainoss) 


Since the BFGS algorithm requires more storage and computation in each 
iteration than the conjugate gradient algorithms, there is need for a secant 
approximation with smaller storage and computation requirements. The one 
step secant (ODSS) method is an attempt to bridge the gap between the 
conjugate gradient algorithms and the quasi-Newton (secant) algorithms. This 
algorithm does not store the complete Hessian matrix; it assumes that at each 
iteration,the previous Hessian was the jidentity matrix. This hasthe additional 
advantage that the new search direction can be calculated without computing 
a matrix inverse. 


In the following code, we reinitialize our previous network and retrain it using 
the one-step secant algorithm. The training parameters for trainoss are the 
Same as those for traincgf. The default line search routine srchbac is used in 
this example. The parameters show and epoch are set to 5 and 300， 
respectively. 


p=[-1-122;0505]; 

t= [-1-111]| 

net=newff(minmax(p),[3,1],{f tansig ，purelin'}+，trainoss ' ) ; 
net .trainParam.Show = 5; 

net.trainParam.epochs = 300; 

net.trainParam.goal = 1e-5; 

[net,tr]=train(net,p,t) ; 
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TRAINOSS-srchbac，Epoch 0/300，MSE 0.665136/1e-05，Gradient 
1.61966/1e-06 

TRAINOSS-srchbac，Epoch 5/300，MSE 0.000321921/1e-05，Gradient 
0.0261425/1e-06 

TRAINOSS-srchbac，Epoch 7/1300，MSE 7.85697e-06/1e-05，Gradient 
0.00527342/1e-06 
TRAINOSS，Performance goal met ， 
= Slim(net,p) 
aa 三 

-1.0035 -0.9958 1.0014 0.9997 


呈 


The one step secant method is described in [Batt92]. This algorithm requires 
less storage and computation per epoch than the BFGS algorithm. It requires 
slightly more storage and computation per epoch than the conjugate gradient 
algorithms. It can be considered a compromise between full quasi-Newton 
algorithms and conjugate gradient algorithms. 


Levenberg-Marquardt (trainlm) 


Like the quasi-Newton methods, the Levenberg-Marquardt algorithm was 
designed to approach second-order training speed without having to compute 
the Hessian matrix. When the performance fanction has the form of a sum of 
squares (as is typical in training feedforward networks)j, then the Hessian 
matrix can be approximated as 


下 三 可 


and the gradient can be computed as 


g = Je 


where J is the Jacobian matrix that contains first derivatives of the network 
erTrors With respect to the weights and biases, and e is a vector of network 
erTrors. The Jacobian matrix can be computed through a standard 
backpropagation technique (see [HaMe94]) that is much less complex than 
computing the Hessian matrix. 


The Levenberg-Marquardt algorithm uses this approximation to the Hessian 
matrix in the following Newton-like update: 


一 工 
xm 1=W-[JJ+HI Je 
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When the scalarhis zero, this is just Newton's method, using the approximate 
Hessian matrix. Whenhis large, this becomes gradient descent with a small 
step size. Newton's method is faster and more accurate near an erToT 
minimum, so the aim is to shift towards Newton's method as quickly as 
possible. Thus, hh is decreased after each successful step (reduction in 
performance function) and is increased only when a tentative step would 
increase the performance fanction. In this way, the performance fanction will 
always be reduced at each iteration of the algorithm. 


In the following code, we reinitialize our previous network and retrain it using 
the Levenberg-Marquardt algorithm. The training parameters for trainlm are 
epochs, show, goal, time, min_ grad, max_fail, mu, mu_dec, mu_inc, mu_max， 
mem_reduc. We have discussed the first six parameters earlier. The parameter 
mu is the initial value for nu. This value is multiplied by mu_dec whenever the 
performance fonction is reduced by a step. Itis multiplied by mu_inc whenever 
a step would increase the performance function. If mu becomes larger than 
mu_max, the algorithm is stopped. The parameter mem_reduc is used to control 
the amount of memory used by the algorithm. It is discussed in the next 
section. The parameters show and epoch are set to 5 and 300, respectively. 


p [-1 -122;0505]; 
t= [-1-111]| 
net=newff(minmax(p),[3,1],{'tansig ，purelin' }，trainlm' ) ; 
net.trainParam.Show = 5; 
net.trainParam.epochs = 300; 
net.trainParam.goal = 1e-5; 
[net,tr]=train(net,p,t) ; 
TRAINLM，Epoch 0/300，MSE 2.7808/1e-05，Gradient 7.77931/1e-10 
TRAINLM，Epoch 4/300，MSE 3.67935e-08/1e-05，Gradient 
0.000808272/1e-10 
TRAINLM，Performance goal met ， 
= Slim(net,p) 


史 呈 


-1.0000 -1.0000 1.0000 0.9996 


The original description of the Levenberg-Marquardt algorithm is given in 
[Marq63]. The application ofLevenberg-Marquardt to neural network training 
is described in [了 aMe94] and starting on page 12-19 of IHDB96]. This 
algorithm appears to be the fastest method for training moderate-sized 
feedqforward neural networks (up to several hundred weights). It also has a 
very efficient MATLAB imnplementation, since the solution ofthe matrix 
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equation is a built-in function, so its attributes become even more pronounced 
in a MATLAB setting、. 


Try the Neural Network Design Demonstration nnd12m [HDB96] for an 
illustration ofthe performance ofthe batch Levenberg-Marquardt algorithm. 


Reduced Memory Levenberg-Marquardt (trainlm) 


The main drawback ofthe Levenberg-Marquardt algorithm is that it requires 
the storage of some matrices that can be quite large for certain problems. The 
Size ofthe Jacobian matrix is Q x7,whereQisthenumber oftraining sets and 
7. is the number of weights and biases in the network. Itturns out that this 
matrix does not have to be computed and stored as a whole. For example, 计 we 
were to divide the Jacobian into two equal submatrices we could compute the 
approximate Hessian matrix as follows: 


J 
站 二 证 让 全 隐 可 加 的 
2 


Therefore, the fl] Jacobian does not have to exist at one time. The 
approximate Hessian can be computed by summing a series of subterms. Once 
one Subterm has been computed, the corresponding submatrix ofthe Jacobian 
can be cleared. 


When using thetraining function trainlm,the parameter mem_reduc is used to 
determine hovw many rows of the Jacobian are to be computed in each 
submatrix. Imem_reduc is set to 1, then the 名]] Jacobian is computed, and no 
memory reduction is achieved. Ifmem_reduc is set to 2, then only half ofthe 
Jacobian will be computed at one time. This saves half ofthe memory used by 
the calculation of the fl]] Jacobian. 


There is a drawback to using memory reduction. A significant computational 
overhead is associated with computing the Jacobian in submatrices. Ifyou 
have enough memory available, then it is better to set mem_reduc to 1 and to 
compute the fll Jacobian. Ifyouhavealargetraining set, andyou are running 
out of memory, then you should set mem_reduc to 2, and try again. 节 you still 
run out of memory, continue to increase mem_reduc, 


卫 ven 让 you use memory reduction, the Levenberg-Marquardt algorithm will 
always compute the approximate Hessian matrix, which has dimensions 刀 X7 刀 . 
I 开 your network is very large, then you may run out of memory. Ifthis is the 
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case, then you will want to try trainscg, trainrp, or one of the conjugate 
gradient algorithms. 
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Speed and Memory Comparison 


Lt is very difficult to know which training algorithm will be the fastest for a 
given problem. It will depend on many factors, including the complexity ofthe 
problem, the number of data points in the training set, the number of weights 
and biasegs in the network, the error goal, and whether the network is being 
used for pattern recognition (discriminant analysis) or function approximation 
(regression). In this section we perform anumber ofbenchmark comparisons of 
the various training algorithms. We train feedforward networks on Si 
different problems. Three of the problems fall in the pattern recognition 
category and the three others fall in the fanction approximation category. Two 
ofthe problems are simple “toy”problems, while the other four are “real world” 
problems. We use networks with a variety of different architectures and 
complexities, and we train the networks to a variety of different accuracy 
levels. 


The following table lists the algorithms that are tested and the acronyms we 
use to jidentify them. 





Acronym Algorithm 





工 M trainlm - Levenberg-Marquardt 

BFG trainbfg - BEGS Quasi-Nevwton 

RP trainrp - Resilient Backpropagation 

SCG trainscg - Scaled Conjugate Gradient 

CGB traincgb - Conjugate Gradient with Powel/Beale 
Restarts 

CGF traincgf - Fletcher-Powell Conjugate Gradient 

CGP traincgp - Polak-Ribikre Conjugate Gradient 

OSS trainoss - One-Step Secant 

GDX traingdx - Variable Learning Rate Backpropagation 
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The following table lists the six benchmark problems and some characteristics 


of the networks, training processes, and computers used. 











Problem Title Problem Neftwork Error Computer 
Type Structure Godal 

SIN Function 1-5-1 0.002 Sun Sparc2 
Approx. 

PARITY Pattern 3-10-10-1 0.001 Sun Sparc 2 
Recog. 

ENGINEE Function 2-30-2 0.005 Sun Enterprise 
Approx. 4000 

CANCER Pattern 9-5-5-2 0.012 ”Sun Sparc 2 
Recog. 

CHOLESTEROL Function 21-15-3 0.027 ”Sun Sparc 20 
Approx. 

DIABETES Pattern 8-15-15-2 0.05 Sun Sparc 20 
Recosg. 

SIN Data Set 


The first benchmark data set is a simple function approximation problem. A 
1-5-1 network, with tansig trangsfer functions in the hidden layer and a linea7 
transfer function in the output layer, is used to approximate a single period of 
a Sine wave. The following table summarizes the results of training the 
network using nine different training algorithms. 上 Each entry in the table 
represents 30 different trials, where different random initial weights are used 
in each trial. In each case, the network is trained until the squared error is less 
than 0.002. The fastest algorithm forthis problem is the Levenberg-Marquardt 
algorithm. On the average, it js over four times faster than the next fastest 
algorithm. This is the type ofproblem for which theLM algorithm is best suited 
一 afunction approximation problem where the network has less than one 
hundred weights and the approximation must be very accurate. 
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Algorithm Mean Ratio Min. Max. Std. 
Time (s) Time (s) Time (s) (5) 
工 M 1.14 1.00 0.65 1.83 0.38 
BFG- 5.22 4.58 3.17 14.38 2.08 
RP 5.67 4.97 2.66 17.24 3.72 
SCG 6.09 5.34 3.18 23.64 3.81 
CGB 6.61 5.80 2.99 23.65 3.67 
CGF 7.86 6.89 3.57 31.23 4.76 
CGP 8.24 7.23 4.07 32.32 5.03 
OSS 9.64 8.46 3.97 59.63 9.79 
GDX 27.69 24.29 17.21 258.15 43.65 





The performance of the various algorithms can be affected by the accuracy 
required of the approximation. This is demonstrated in the following figure， 
which plots the mean square error versus execution time (averaged over the 30 
trials) for several representative algorithms. Here we can see that the error ip 
the LM algorithm decreases much more rapidly with time than the other 
algorithms shown. 
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The relationship between the algorithms is further illustrated in the following 
和 gure, which plots the time required to converge versus the mean Square erTor 
convVergence goal. Here, we can see that as the error goal is reduced, the 
improvement provided by the LM algorithm becomes more pronounced. Some 
algorithms perform better as the error goal is reduced (LM and BFG), and 
other algorithms degrade as the error goal is reduced (OSS and GDX). 


5-35 


5 Backpropagation 





5-36 








Im 

bfg 
Scg 
gdx 
cgb 
oss | 了 
rp 























mean-square-error 


PARITY Data 9ef 


The second benchmark problem is a simple pattern recognition problem 一 
detect the parity of a 3-bit number. Ifthe number of ones in the input pattern 
is odd, then the network should output a one; otherwise, it should output a 
minus one. The network used for this problem is a 3-10-10-1 network with 
tangsig neurons in each layer. The following table summarizes the results of 
training this network with the nine different algorithms. 也 ach entry in the 
table represents 30 different trials, where different random initial weights are 
used in each trial. In each case, the network is trained until the squared erTor 
is less than 0.001. The fastest algorithm for this problem is the resilient 
backpropagation algorithm, although the conjugate gradient algorithms (in 
particular, the scaled conjugate gradient algorithm) are almost as fast. Notice 
that the LM algorithm does not perform well on this problem. In general, the 
LM algorithm does not perform as well on pattern recognition problems as 让 
does on function approximation problems. The LM algorithm is designed for 
least squares problems that are approximately linear. Since the output 
neurons in pattern recognition problems will generally be saturated, we will 
not be operating in the linear region. 


Speed and Memory Comparison 











Algorithm ”Mean Ratio Min. Max. Std. 
Time (s) Time (5) Time (5s) 

RP 3.73 1.00 2.35 6.89 1.26 
SCG 4.09 1.10 2.36 7.48 1.56 
CGP 5.13 1.38 3.50 8.73 1.05 
CGB 5.30 1.42 3.91 11.59 1.35 
CGF 6.62 1.77 3.96 28.05 4.32 
OSS 8.00 2.14 5.06 14.41 1.92 
工 M 13.07 3.50 6.48 23.78 4.96 
BFG- 19.68 5.28 14.19 26.64 2.85 
GDX 27.07 7.26 25.21 28.52 0.86 





As with function approximation problems, the performance ofthe variousg 


algorithms can be affected by the accuracy required of the network. This is 
demonstrated in the followingfigure, which plots the mean square erTror VerTSUs 


execution time for some typical algorithms. The LM algorithm converges 
rapidqly after some point, but only after the other algorithms have already 


converged. 
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The relationship between the algorithms is further illustrated in the following 
和 gure, which plots the time required to converge versus the mean Square erTor 
convergence goal. Again we can see that some algorithms degrade as the erTror 
goal is reduced (0OSS and BFG). 
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ENGINE Data 9ef 


The third benchmark problem is a reajlistic function approximation (or 
Donlinear regression) problem. The data is obtained 他 om the operation of an 
engine. The inputs to the network are engine speed and fueling levels and the 
network outputs are torque and emission levels. The network used for this 
problem is a 2-30-2 network with tansig neurongs in the hidden layer and linear 
neurons in the output layer. The following table summarizes the results of 
training this network with the nine different algorithms. Each entry in the 
table represents 30 different trials (10 trials for RP and GDX because of time 
congstraints), where different random initial weights are used in each trial. Im 
each case, the network is trained untilthe squared error is less than 0.005. The 
fastest algorithm for this problem is the LM algorithm, although the BFGS 
quasi-Newton algorithm and the conjugate gradient algorithms (the scaled 
conjugate gradient algorithm in particular) are almost as fast. Although this is 
a fanction approximation problem, the LM algorithm is not as clearly Superior 
as 让 was on the SIN data set. In this case, the number of weights and biases in 
the network is much larger than the one used on the SIN problem (152 versus, 
16), and the advantages ofthe LM algorithm decrease as the number of 
network parameters increase. 








Algorithm 。” Mean Ratio Min. Max. Std. 
Time (s) Time (s) Time (s) (5) 

工 M 18.45 1.00 12.01 30.03 4.27 
BFG- 27.12 1.47 16.42 47.36 5.95 
SCG 36.02 1.95 19.39 52.45 7.78 
CGF 37.93 2.06 18.89 50.34 6.12 
CGB 39.93 2.16 23.33 55.42 7.50 
CGP 44.30 2.40 24.99 71.55 9.89 
OSS 48.71 2.64 23.51 80.90 12.33 

RP 65.91 3.57 31.83 134.31 34.24 
GDX 188.50 10.22 81.59 279.90 66.67 
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The following figure plots the mean Square eITor verSuUS execution time for 
some typical algorithms. The performance ofthe LM algorithm improves over 
time relative to the other algorithms. 
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The relationship between the algorithms is further illustrated in the following 
和 gure, which plots the time required to converge versus the mean Square erTor 
convergence goal. Again we can see that some algorithms degrade as the erTror 
goal is reduced (GDX and RP), while the LM algorithm improves. 
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CANCER Data 9ef 


The fourth benchmark problem is areajlistic pattern recognition (or nonlineaT 
discriminant analysis) problem. The objective ofthe network is to classjfy a 
tumor as either benign or malignant based on cell descriptions gathered by 
microscopic exzamination. Input attributes include clump thickness, uniformity 
of cell size and cell shape, the amount ofmarginal adhesion, and the frequency 
of bare nuclei. The data was obtained from the University of Wisconsin 
Hospitals, Madison, 人 om Dr. William 了 . Wolberg. The network used for this 
problem is a 9-5-5-2 network with tansig neurons in all layers. The following 
table summarizes the results of training this network with the nine different 
algorithms. 了 ach entry in the table represents 30 different trials, where 
different random initial weights are used in each trial. In each case, the 
network is trained until the squared error is less than 0.012.A few runs failed 
to converge for some of the algorithms, so only the top 75% of the runs fom 
each algorithm were used to obtain the statistics. 


The conjugate gradient algorithms and resilient backpropagation all provide 
fast convergence, and the LM algorithm is also reasonably fast. As we 
mentioned with the parity data set, theLM algorithm does not perform as well 
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on pattern recognition problems as it does on function approximation 








problems. 
Algorithm Mean Ratio Min. Max. Std. 
Time (s) Time (5s) Time (5s) 
CGB 80.27 1.00 55.07 102.31 13.17 
RP 83.41 1.04 59.51 109.39 13.44 
SCG 86.58 1.08 41.21 112.19 18.25 
CGP 87.70 1.09 56.35 116.37 18.03 
CGF 110.05 1.37 63.33 171.53 30.13 
工 M 110.33 1.37 58.94 201.07 38.20 
BFG 209.60 2.61 118.92 318.18 58.44 
GDX 313.22 3.90 166.48 446.43 75.44 
OSS 463.87 5.78 250.62 599.99 97.35 





The following figure plots the mean Square erTor verSuUs execution time for 
some typical algorithms. For this problem we don't see as much variation in 
performance as we have seen in previous problems. 
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The relationship between the algorithms is further illustrated in the following 
和 gure, which plots the time required to converge versus the mean Square erTor 
convergence goal. Again we can see that some algorithms degrade as the erTror 
goal js reduced (0OSS and BFG) while theLM algorithm improves. It is typical 
of the LM algorithm on any problem that its performance imnproves relative to 
other algorithms as the error goal is reduced. 
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CHOLESTEROL Data Set 


The ffth benchmark problem is a realistic fanction approximation (or 
Donlinear regression) problem. The objective ofthe network is to predict 
cholesterol levels (1ql, hdl and vldql) based on measurements of 21 spectral 
components. The data was obtained fom Dr. Neil Purdie, Department of 
Chemistry, Oklahoma State University [PuLu92]. The network used for this 
problem is a 21-15-3 network with tansig neurons in the hidden layers and 
linear neurons in the output layer. The following table summarizes the results 
of training this network with the nine different algorithms. Each entry in the 
table represents 20 different trials (10 trials for RP and GDX), where different 
random initial weights are used in each trial. In each case, the network is 
trained until the squared error is less than 0.027. 


The scaled conjugate gradient algorithm has the best performance on this 
problem, although all of the conjugate gradient algorithms perform well. The 
LM algorithm does not perform as well on this fanction approximation problem 
as 让 did on the other two. That is because the number of weights and biases in 
the network has increased again (378 versus 152 versus 16). As the number of 
parameters increases, the computation required in theLM algorithm increases 
geometrically. 


Speed and Memory Comparison 











Algorithm Mean Ratio Min. Max. Std. 
Time (s) Time (s) Time (s) (5) 
SCG 99.73 1.00 83.10 113.40 9.93 
CGP 121.54 1.22 101.76 162.49 16.34 
CGB 124.06 1.24 107.64 146.90 14.62 
CGF 136.04 1.36 106.46 167.28 17.67 
工 M 261.50 2.62 103.52 398.45 102.06 
OSS 268.55 2.69 197.84 372.99 56.79 
BFG- 550.92 5.52 471.61 676.39 46.59 
RP 1519.00 15.23 581.17 2256.10 557.34 
GDX 3169.50 31.78 2514.90 4168.20 610.52 





The following fgure plots the mean Square erTor veTsuUS execution time for 


Some typical algorithms. For this problem, we can see that the LM algorithm 


is able to drive the mean square error to a lower level than the other 


algorithms. The SCG and RP algorithms provide the fastest initial 


COnVelgenmnCce. 
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The relationship between the algorithms is further illustrated in the following 
和 gure, which plots the time required to converge versus the mean Square erTor 
conveTgence goal. We can see that the LM and BFG algorithms improve 
relative to the other algorithms as the error goal is reduced. 
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DIABETES Data Set 


The sixth benchmark problem is a pattern recognition problem. The objective 
ofthenetwork is to decide ifan indqividual has diabetes, based on personal data 
(age, number of times pregnant) and the results ofmedical examinations (e.g.， 
blood pressure, body mass index, result ofglucose tolerance test, etc.). The data 
was obtained from the University of Cajlifornia, Irvine, machine learning data 
base. The network used for this problem is an 8-15-15-2 network with tansig 
neurons in all layers. The following table summarizes the results of training 
this network with the nine different algorithms. 了 ach entry in the table 
represents 10 different trials, where different random initial weights are used 
in each trial. In each case, the network is trained until the squared error is less 
than 0.05. 


The conjugate gradient algorithms and resilient backpropagation all provide 
fast convergence. The results on this problem are consistent with the other 
pattern recognition problems we have considered. The RP algorithm works 
well on all ofthe pattern recognition problems. This is reasonable, since that 
algorithm was designed to overcome the difficulties caused by training with 
sigmoid fanctions, which have very Small slopes when operating far 们 om the 
center point. For pattern recognition problems, we use sigmoid transfer 
fanctiongs in the output layer, and we want the network to operate at the tails 
of the sigmoid function 








Algorithm Mean Ratio Min. Max. Std. 
Time (s) Time (s) Time (s) (5) 
RP 323.90 1.00 187.43 576.90 111.37 
SCG 390.53 1.21 267.99 487.17 75.07 
CGB 394.67 1.22 312.25 558.21 85.38 
CGP 415.90 1.28 320.62 614.62 94.77 
OSS 784.00 2.42 706.89 936.52 76.37 
CGF 784.50 2.42 629.42 1082.20 144.63 
工 M 1028.10 3.17 802.01 1269.50 166.31 
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Algorithm Mean Ratio Min. Max. Std. 
Time (s) Time (5) Time (5) (5) 
BFG- 1821.00 5.62 1415.80 3254.50 546.36 
GDX 7687.00 23.73 5169.20 2015.00 
10350.00 





The following figure plots the mean Square eITor verSuUgs execution time for 
some typical algorithms. As with other problems, we see thatthe SCG and RP 
have fast initial convergence, whiletheLM algorithm is able to provide smajller 
final error. 
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The relationship between the algorithms is further illustrated in the following 
和 gure, which plots the time required to converge versus the mean Square erTor 
conveTgence goal. In this case, we can see that the BFG algorithm degrades as 
the erTror goal js reduced, while the LM algorithm improves. The RP algorithm 
is best, except at the smallest error goal, where SCG is better. 


5-48 


Speed and Memory Comparison 











米 米  |m 

xx bfg 
全 一 全、 :600 
一 一 全 gdx 
光一 
本 -一空 0Sss 
了 Q np 























10F 107 10 
mean-Square-error 


Summary 


There are several algorithm characteristics that we can deduce from the 
experiments we have described. In general, on function approximation 
problems, for networks that contain up to a few hundred weights, the 
Levenberg-Marquardt algorithm will have the fastest convergence. This 
advantage is especially noticeable ifvery accurate training is required. In 
many cases, trainlmis able to obtain lower mean square erTors than any ofthe 
other algorithms tested. However, as the number of weights in the network 
increases, the advantage ofthe trainlm decreases. In addition, train1lm 
performance is relatively poor on pattern recognition problems. The storage 
requirements oftrainlm are larger than the other algorithms tested. By 
adjusting the mem_reduc parameter, discussed earlier, the storage 
requirements can be reduced, but at a cost of increased execution time. 


The trainrp function is the fastest algorithm on pattern recognition problems. 
However, it does not perform well on function approximation problems. Its 
performance also degrades as the error goal is reduced. The memory 
requirements for this algorithm are relatively small in comparison to the other 
algorithms considered. 


The conjugate gradient algorithms, in particular trainscg, seem to perform 
well over a wide variety of problems, particularly for networks with a large 
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number of weights. The SCG algorithm is almost as fast as the LM algorithm 
on function approximation problems (faster for large networks) and is almost 
as fast as trainrp on pattern recognition problems. Its performance does not 
degrade as quickly as trainrp performance does when the error is reduced. 
The conjugate gradient algorithms have relatively modest memory 
requirements. 


The trainbfg performance is similar to that oftrainlm. It does not require as 
much storage as trainlm, butthe computation required does increase 
geometrically with the size of the network, since the equivalent of a matrix 
inverse must be computed at each iteration. 


The variable learning rate algorithm traingdxis usuallymuch slowerthanthe 
other methods, and has about the same storage requirements as trainrp, but 
it can still be useful for some problems. There are certain situations in which 
it is better to converge more slowly. For exzample, when using early stopping (as 
described in the next section) you may have inconsistent results 让 you use an 
algorithm that converges too quickly. You may overshoot the point at which the 
erTror on the validation set is minimized. 


Improving Ceneralization 





Improving Generalizaftion 


One of the problems that occurs during neural network training is called 
overfitting. The error on the training set is driven to a very small value, but 
when new data is presented to the network the error is large. The network has 
memorized the training examples, but it has not learned to generalize to new 
Situations. 


The following figure shows the response of a 1-20-1 neural network that has 
been trained to approximate a noisy sine function. The underlying sine 
fanction is shown by the dotted line, the noisy measurements are given by the 
“Symbols, and the neural network respongse is given by the solid line. Clearly 
this network has overfit the data and will not generalize wel!. 
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One method for improving network generalization is to use a network that is 
just large enough to provide an adequate fit. The larger anetwork youuse, the 
more complex the functions the network can create. If we use a small enough 
network, it will not have enough power to overfit the data. Run the Neural 
Network Design Demonstration nnd11gn [HDB96] to investigate how reducing 
the size of a network can prevent overfitting. 


Unfortunately,itis difficult to know beforehand hovw large anetwork should be 
for a specific application. There are two other methods for Improving 
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generalization that are imnplemented in the Neural Network Toolbox: 
regularization and early stopping. The next few subsectiongs qdqescribe these two 
techniques, and the routines to imnplement them. 


Notethatifthenumber ofparameters in the network is much smaller than the 
total number of points in the training set, then there is little or no chance of 
overfitting. 开 you can eagsily collect more data and increase the size ofthe 
training set, then there is no need to worry about the following techniques to 
prevent overfitting. The rest ofthis section only applies to those Situationgs in 
which you want to make the most of a limited supply of data. 


Regularization 


The first method for imnproving generalization is called regularization. This 
involves modifying the performance function, which is normally chosen to be 
the sum of squares of the network errors on the training set. The next 
subsection explains how the performance function can be modified, and the 
following subsection describes a routine that automatically sets the optimal 
performance function to achieve the best generalization. 


Modified Performance Function 


The typical performance function that is used for training feedforward neural 
networks is the mean sum of squares of the network erTrors. 
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It is possible to improve generalization 让 we modify the performance function 
byadding aterm that consists ofthe mean ofthe sum ofsquares ofthe network 
weights and biasesgs 


10Sereg = Y12Se+( 工 -Y)72SsZW 


where Y is the performance ratio, and 
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Improving Ceneralization 





Using this performance function will cause the network to have smaller 
weights and biases,andthis will force thenetwork response to be smoother and 
less likely to overfit. 


In the following code we reinitialize our previous network and retrain it using 
the BFGS algorithm with the regularized performance function. Here we set 
the performance ratio to 0.5, which gives equal weight to the mean square 
erTrors and the mean Square weights. 


p=[-1-122;0505]; 
t=[-1-111]| 
net=newff(minmax(p),[3,1],{f tansig ，purelin' +，trainbfg ) ; 
net.performFcn = :msereg 

net.performParam.ratio = 0.5; 

net .trainParam.Show = 5; 

net.trainParam.epochs = 300; 

net.trainParam.goal = 1e-5; 

[net,tr]=train(net,p,t) ; 


The problem with regularization is thatitis difficult to determine the optimum 
value for the performance ratio parameter. I we make this parameter too 
large, we may get overfitting. Ifthe ratio is too small, the network will not 
adequately fit the training data. In the next section we describe aroutine that 
automatically sets the regularization parameters. 


Automated Regularization (trainbn) 

It is desirable to determine the optimal regularization parameters in an 
automated fashion. One approach to this process is the Bayesian 位 amework of 
David MacKay [MacK92]. In this framework, the weights and biases ofthe 
network are assumed to be random variables with specified distributions. The 
regularization parameters are related to the unknown variances associated 
with these distributions. We can then estimate these parameters using 
statistical techniques. 


A detailed discussion of Bayesian regularization is beyond the scope of this 
users guide. A detailed discussion of the use of Bayesian regularization, in 
combination with Levenberg-Marquardt training, can be found in [FoHa97]. 


Bayesian regularization has been implemented in the function trainbr. The 
following code shows how we can train a 1-20-1 network using thigs fanction to 
approximate the noisy sine wave Shown earlier in this section. 
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pP=[-1:.05:1]; 

七 = Sin(2*pix*p)+0.1*randn(Size(p) ); 

net=newff (minmax(p),[20;,1],{ tansig' ，purelin' }，trainbr ' ) ; 
net.trainParam.Show = 10; 

net.trainParam.epochs = 50; 

randn(' seed ' ,192736547 ) ; 

net = Init(net) ; 

[net,tr]=train(net,pt) ; 

TRAINBR，Epoch 0/200，SSE 273.764/0，SSW 21460.5，Grad 2.96e+02/ 
1.00e-10，#Par 6.10e+01/61 

TRAINBR，Epoch 40/200，SSE 0.255652/0，SSW 1164.32，Grad 1.74e-02/ 
1.00e-10，#Par 2.21e+01/61 

TRAINBR ，Epoch 80/200，SSE 0.317534/0，SSW 464.566，Grad 5.65e-02/ 
1.00e-10，#Par 1.78e+01/61 

TRAINBR ，Epoch 120/200，SSE 0.379938/0，SSW 123.028，Grad 
3.64e-01/1.00e-10，#Par 1.17e+01/61 

TRAINBR，Epoch 160/200，SSE 0.380578/0，SSW 108.294，Grad 
6.43e-02/1.00e-10，#Par 1.19e+01/61 


One feature ofthis algorithm is that it provides a measure of how many 
network parameters (weights and biases) are being effectively used by the 
network. In this case, the final trained network uses approximately 12 
parameters (indicated by #Par in the printout) out ofthe 61 total weights and 
biases in the 1-20-1 network. This effective number of parameters should 
remain approximately the same, no matter how large the total number of 
parameters in the network becomes. (This assumes that the network has been 
trained for a sufficient number of iterations to ensure convergence.) 


The trainbr algorithm generally works best when the network inputs and 
targets are scaled so that they fall approximately in the range [-1,1. That is 
the case for the test problem we have used. If your inputs and targets do not 
fall in this range, you can use the functions premnmx, or prestd, to perform the 
scaling, as described later in this chapter. 


The following fgure shows the response ofthe trained network. In contrast to 
the previous figure, in which a 1-20-1 network overfit the data,here we seethat 
the network response is very close to the underlying sine function (dotted line)， 
and, therefore, the network will generalize well to new inputs. We could have 
tried an even larger network, but the network response would never overfit the 
data. This eliminates the guesswork required in determining the optimum 
Detwork Size. 


Improving Ceneralization 





When using trainbr,it is imnportant to let the algorithm run until the effective 
number ofparameters has converged. The training may stop with the message 
“Maximum MU reached.”This is typical, and is a good indication that the 
algorithm has truly converged. You can also tell that the algorithm has 
converged ifthe sum squared error (SSE) and sum squared weights (SSW) are 
relatively constant over Several iterations. When this occurs you may want to 
push the“Stop Training”button in the training window. 
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Early Stopping 

Another method for improving generalization is called early stoppi75. In thigs 
technique the available data is divided into three subsets. The first subset is 
the training set, which is used for computing the gradient and updating the 
network weights and biases. The second subset is the validation set. The error 
on the validation set is monitored during the training process. The validation 
error will normally decrease during the initial phase oftraining, as does the 
training set erTror. However, when the network begins to overfit the data, the 
error on the validation set will typically begin to rise. When the validation erTror 
increases for a specified number of iterations, the training is stopped, and the 
weights and biases at the minimum ofthe validation error are returned. 


The test set error is not used during the training, but it is used to compare 
different models. It is also useful to plot the test set error during the training 
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process. Ifthe error in the test set reaches a minimum at a significantly 
different iteration number than the validation set error, this may indicate a 
poor division ofthe data set. 


卫 arly stopping can be used with any of the training functiongs that were 
described earlier in this chapter. You Simply need to pass the validation data 
to the training function. The following sequence of commands demonstrates 
hovw to use the early stopping fanction. 


First we create a Simple test problem. For our training set we generate a noisy 
sine wave with input points ranging from -1lto 1 at steps of 0.05. 


pP=[-1:0.05:1] 
七 = Sin(2*pix*p)+0.1*randn(Size(p) ); 


Next we generate the validation set. The inputs range 位 om -1 to 1l, as in the 
test set, but we offset them slightly. To make the problem more realistic，we 
also add a different noise sequence to the underlying sine wave.Notice thatthe 
validation set is contained in a structure that contains both the inputs and the 
targets. 


val.P = [-0.975:.05:0.975]; 
val.T= Sin(2*pixv.P)+0.1*randn(Size(v.P) ); 


We now create a 1-20-1 network, as in our previous example with 
regularization, and train it. (Notice that the validation Structure is passed to 
train afterthe initial input and layer conditions, which are null vectors in this 
case Since the network contains no delays. Also, in this example we are not 
Using a test set. The test set Structure would be the next argument in the call 
to train.) For this example we use the training function traingdx, although 
early stopping can be used with any of the other training functions we have 
discussed in this chapter. 


net=newff([-1 1],[20,1],{ 人 tansig ，purelin' }，traingdx ) ; 
net.trainParam.Show = 25; 

net.trainParam.epochs = 300; 

net = Init(net) ; 

[net,tr]=train(net,pt,[],，[],val); 

TRAINGDX，Epoch 0/300，MSE 9.39342/10，Gradient 17.7789/11e-06 
TRAINGDX，Epoch 25/300，MSE 0.312465/0，Gradient 0.873551/1e-06 
TRAINGDX，Epoch 50/300，MSE 0.102526/0，Gradient 0.206456/1e-06 
TRAINGDX，Epoch 75/300，MSE 0.0459503/10，Gradient 0.0954717/1e-06 
TRAINGDX，Epoch 100/300，MSE 0.015725/0，Gradient 0.0299898/1e-06 
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TRAINGDX，Epoch 125/300，MSE 0.00628898/0，Gradient 0.042467/ 
1e-06 

TRAINGDX，Epoch 131/300，MSE 0.00650734/0，Gradient 0.133314/ 
1e-06 

TRAINGDX，Validation stop . 


The following figure shows a graph of the network response. We can See that 
the network did not overfit the data, as in the earlier example, although the 
response is not extremely smooth, as when using regularization. Thigs ig 
characteristic of early stopping. 


Function Approximation 





Output 














Summary and Discussion 


Both regularization and early stopping can ensure network generalization 
when properly applied. 


When using Bayesian regularization, it is important to train the network until 
it reaches convergence. The sum squared error, the sum squared weights, and 
the effective number of parameters should reach constant values when the 
network has converged. 


For early stopping, you must be careful not to use an algorithm that converges 
too rapidly. Ifyou are usingafastalgorithm (like trainlm), you want to set the 
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training parameters so that the convergence is relatively slow (e.g., set mu to a 
relatively large value, such as 1l, and set mu_dec and mu_inc to values close to 
1, such as 0.8 and 1.5, respectively). The training functions trainscg and 
trainrp usually work well with early stopping. 


With early stopping, the choice of the validation set is also imnportant. The 
validation set should be representative of all points in the training set. 


With both regularization and early stopping, it is a good idea to train the 
Detwork starting from several different initial conditions. It is possible for 
either method to fail in certain circumstances. By testing several different 
initial condqitions, you can verify robust network performance, 


Based on our experience, Bayesian regularization generally provides better 
generalization performance than early stopping, when training function 
approximation networks. This is because Bayesian regularization does not 
require that avalidation data set be separated out of the training data set. It 
uses all ofthe data. This adqvantage is especially noticeable when the size ofthe 
data set is small. 


To provide you with some insight into the performance of the algorithms, we 
tested both early stopping and Bayesian regularization on Several benchmarK 
data sets, which are listed in the following table. 





Data Sef Title 


No. Network Description 





pts。 

BALL 67 2-10-1 Dual-sensor calibration for a ball position 
measurement. 

SINE (5% N) 41 1-15-1 Single-cycle sine wave with Gaussian noise at 5950 
level. 

SINE (2% N) 4 1-15-1 Single-cycle sine wave with Gaussian noise at 2%2 
level. 

ENGINE (ALTID) 119 2-30-2 Engine sensor - full data set. 

9 
ENGINE (1L4) 300 2-30-2 Engine sensor - 1L/4 of data set. 
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Data Set Title 


No. Network Descripfion 
pts. 





CHOLEST 
(ALD) 


CHOLEST (1/2) 


264 5-15-3 Cholesterol measurement - f]] data set. 


132 5-15-3 Cholesterol measurement - 1/2 data set. 





These data sets are of various sizes, with different numbers of inputs and 
targets. With two ofthe data sets we trained the networks once Using all ofthe 
data and then retrained the networks using only a fraction ofthe data. This 
illustrates how the advantage of Bayesian regularization becomes more 
noticeable whenthe data sets are smaller.All ofthe data sets are obtained 人 om 
physical systems, except for the SINE data sets. These two were artificially 
created by adding various levels ofnoise to a single cycle of a sine wave. The 
performance of the algorithms on these two data sets illustrates the effect of 
noise. 


The following table summarizes the performance of 了 arly Stopping (ES) and 
Bayesian Regularization (BR) on the seven test sets. (The trainscg algorithm 
was used for the early stopping tests. Other algorithms provide similar 
performance.) 


Mean Squared Test Set Error 





Method 


Engine “Engine “Choles “Choles Sine Sine (22%e N) 
(All) (1/4) (All) (1/2) (5% N) 





卫 $ 
BR 
ES/ABR 


1.2e-1 
1.3e-3 
92 


1.3e-2 1.9e-2 1.2e-1 1.4e-1 ]. /es 1.3e-1 
2.6e-3 4.7e-3 1.2e-1 9.3e-2 3.0e-2 6.3e-3 
5 4 1.5 5.7 21 





We can see that Bayesian regularization performs better than early stopping 
in most cases. The performance improvement is most noticeable when the data 
set js Small, or ifthere is little noise in the dqata set. The BALL dqata set, for 
example, was obtained from sensors that had very little noise， 


Although the generalization performance of Bayesian regularization is often 
better than early stopping, this is not always the case. In addition, the form of 
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Bayesian regularization implemented in the toolbox does not perform as well 
on pattern recognition problems as it does on function approximation 
problems. This is because the approximation to the Hessian that is used in the 
Levenberg-Marquardt algorithm is not as accurate when the network output is 
saturated, as would be the case in pattern recognition problems. Another 
disadvantage ofthe Bayesian regularization method is that it generally takes 
longer to converge than early stopping. 


Preprocessing and Postprocessing 





Preprocessing and Postprocessing 


Neural network training can be made more e 仁 cient if certain preprocessing 
steps are performed on the network inputs and targets. In this section， we 
describe several preprocessing routines that you can Use. 


Min and Max (premnmx,， postmnmx, tramnmx) 


Before training, it is often useful to scale the inputs and targets so that they 
always fall within a specified range. The function premnmx can be used to scale 
inputs and targets so that they fall in the range [-1,1]. The following code 
ilustrates the use of this function. 


[pn,minp,maxp,tnmint,maxt] = premnmx(p ,七 ) ; 
net=train(net,pntn) ; 


The original network inputs and targets are given in the matrices p and t. The 
normalized inputs and targets, pn and tn, that are returned will all fall in the 
interval [-1,1]. The vectors minp and maxp contain the minimum and maximum 
values of the original inputs, and the vectors mint and maxt contain the 
minimum and maximum values ofthe original targets. After the network has 
been trained, these vectors should be used to transform any future inputs that 
are applied to the network. They effectively become apart of the network, just 
like the network weights and biases. 


Ifpremnmx is used to scale both the inputs and targets, then the output of the 
network will be trained to produce outputs in the range [-1,H. Iyou want to 
convert these outputs back into the same units that were used for the original 
targets, then you should use the routine postmnmx. In the following code, we 
simnulate the network that was trained in the previous code, and then convert 
the network output back into the original units. 


an = Sim(net,pn); 
a = postmnmx(an;,mint,maxt) ; 


The network output an will correspond to the normalized targets tn. The 
un-normalized network output a is in the same units as the original targets 七. 


Ifpremnmx is used to preprocess the training set data, then whenever the 
trained network is used with new inputs they should be preprocessed with the 
minimum and maximums that were computed for the training set. This can be 
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accomplished with the routine tramnmx. In the following code, we apply a new 
set of inputs to the network we have already trained. 


pnewn = tramnmx(pnew,minp,maxp) ; 
anewn = Slim(net,pnewn) ; 
anew = postmnmx(anewn,mint,maxt) ; 


Mean and Stand. Dev. (prestd， poststd, frastd) 


Another approach for scaling network inputs and targets is to normalize the 
mean and standard deviation of the training set. This procedure is 
imnplemented in the fanction prestd. It normalizes the inputs and targets so 
thatthey will have zero mean and unity standard deviation. The following code 
ilustrates the use of prestd. 


[pn,meanp,stdp,tn,meant,stdt] = prestd(p, 七 ) ; 


The original network inputs and targets are given in the matrices p and t. The 
normalized inputs and targets, pn and tn, that are returned will have zero 
means and unity standard deviation. The vectors meanp and stdp contain the 
mean and standard deviations ofthe original inputs, andthe vectors meant and 
stdt contain the means and standard deviations of the original targets. After 
the network has been trained, these vectors should be used to transform any 
future inputs that are applied to the network. They effectively becomeapart of 
the network, just like the network weights and biases， 


Ifprestd is used to scale both the inputs and targets, then the output ofthe 
network is trained to produce outputs with zero mean and unity standard 
deviation. Ifyou want to convert these outputs back into the same units that 
were used for the original targets, then you should use the routine poststd. In 
the following code we simulate the network that was trained in the previous 
code, and then convert the network output back into the original units. 


an = Sim(net,pn) ; 
a = poststd(an,meant ,stdt) ; 


The network output an corresponds to the normalized targets tn. The 
un-normalized network output a is in the same units as the original targets t. 


Ifprestdis used to preprocess the training set data, then wheneverthetrained 
network is used with new inputs, they should be preprocessed with the means 
and standard deviations that were computed for the training set. This can be 
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accomplished with the routine trastd. In the following code, we apply a new 
set of inputs to the network we have already trained. 


pnewn = trastd(pnew,meanp ,stdp ) ; 
anewn = Sim(net,pnewn) ; 
anew = poststd(anewn,meant ,Stdt) ; 


Principal Component Analysis (Prepca, trapca) 


In some situations, the dimension ofthe input vector is large, but the 
components ofthe vectors are highly correlated (redundant). It is useful in this 
Situation to reduce the dimension ofthe input vectors. An effective procedure 
for performing this operation is principal component analysis. This technique 
has three effects: it orthogonalizes the components ofthe input vectors (So that 
they are uncorrelated with each other); t orders the resulting orthogonal 
components (principal components) so that those with the largest variation 
come first; and it eliminates those components that contribute the least to the 
variation in the data set.The following codeillustrates the use ofprepca, which 
performs a principal component analysis, 


[pn,meanp,stdp] = prestd(p) ; 
[ptrans,transMat] = prepca(pn,0.02) ; 


Note that we first normalize the input vectors, using prestd, so thatthey have 
zero mean and unity variance. This is a standard procedure when using 
principal components. In this example, the second argument passed to prepca 
is 0.02. This means that prepca eliminates those principal components that 
contribute less than 2% to the total variation in the data set. The matrix 
ptrans contaings the transformed input vectors. The matrix transMat contaings 
the principal component transformation matrix. After the network has been 
trained, this matrix should be used to transform any future inputs that are 
applied to the network. It effectively becomes apart ofthe network, just like 
the network weights and biases. Iyou multiply the normalized input vectors 
pn by the transformation matrix transMat, you obtain the transformed input 
Vectors ptrans. 


Ifprepcais used to preprocess thetraining set data, then whenever thetrained 
network is used with new inputs they should be preprocessed with the 
transformation matrix that was computed for the training set. This can be 
accomplished with the routine trapca. In the following code, we apply a new 
set of inputs to a network we have already trained. 
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pnewn = trastd(pnew,meanp,stdp) ; 
pnewtrans = trapca(pnewn,transMat ) ; 
a= Sim(net,pnewtrans ) ; 


Post-Training Analysis (Postreg) 


The performance of a trained network can be measured to some extent by the 
errors on the training, validation and test sets, but it is often useful to 
investigate the network response in more detail. One option is to perform a 
regression analysis between the network response and the corresponding 
targets. The routine postreg is designed to perform this analysis. 


The following commands illustrate how we can perform a regression analysigs 
on the network that we previously trained in the early stopping section. 


a=Sim(net,p); 
[m,b,r] = postreg(a,t) 


自 三 
0.9874 

b = 
-0.0067 

广 三 
0.9935 


Here we pass the network output and the corresponding targets to postreg. 攻 
returns three parameters. The first two, m and b, correspond to the slope and 
the y-intercept ofthe best linear regression relating targets to networK 
outputs. Ifwe had a perfect fit (outputs exactly equal to targets), the slope 
would be 1, and they-intercept would be 0. In this example, we can see thatthe 
numbers are very close. The third variable returned by postreg is the 
correlation coefficient (R-value) between the outputs and targets. It is a 
measure of how well the variation in the output is explained by the targets. 工 
this number is equal to 1, then there is perfect correlation between targets and 
outputs. In our example, the number is very close to 1, which indicates a good 
人 it. 


The following figure illustrates the graphical output provided by postreg. The 
network outputs are plotted versus the targets as open circles. The best linear 
fit is indicated by a dashed line. The perfect fit (output equal to targets) is 
indicated by the solid lne. In this exzample, it is difficult to distinguish the best 
linear fit line from the perfect fit line, because the fit is So good. 
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Best Linear Fit: A = (0.987)T+ (-0.00667) 

















-0.5 








O 〇 





Data Points 


Best Linear Fit 








5-65 


5 Backpropagation 





Sample Training Session 


5-66 


We have covered anumber of different concepts in this chapter. At this point 让 
might be useful to put some of these ideas together with an example of how a 
typical training session might go. 


For this example, we are going to use data from a medical application 
[PuLu92]. We want to design an instrument that can determine serum 
cholesterol levels from measurements ofspectral content ofa blood sample. We 
haveatotal of264patients for which wehavemeasurements of21 wavelengths 
of the spectrum. For the same patients we also have measurements of hdl, 1dl]， 
and vldl cholesterol levels, based on serum separation. The first step is to load 
the data into the workspace and perform a principal component analysis. 


10ad choles_al1 
[pn,meanp,stdp,tn,meant,stdt] = prestd(p, 七 ) ; 
[ptrans transMat] = prepca(pn,0.001) ; 


Here we have conservatively retained those principal components which 
account for 99.9% of the variation in the data set. Let's check the size of the 
transformed data. 


[R,Q] = Size(ptrans) 


There was apparently significant redundancy in the data set, since the 
principal component analysis has reduced the size ofthe input vectors fom 21 
to 4. 


The next step is to divide the data up into training,validation and test subsets. 
We will take one fourth ofthe data for the validation set, one fourth for the test 
set and one halffor the training set. We pick the sets as equally spaced points 
throughout the original data. 


1Iitst = 2:4:Q; 

1Iival = 4:4:0Q; 

iitr = [1:4:Q 3:4:Q] 

val.P = ptrans(:,iival); val.T = tn(:,Iival) 
test.P = ptrans(:,Iitst); test.T = tn(:, Itst); 
ptr = ptrans(:,iitr);j ttr = tn(:， Itr) 


Soample Training Session 





We are now ready to create a network and train it. For this example, we will 
try atwo-layer network, with tan-sigmoid transfer fanction in the hidden layer 
andalinear transfer function in the output layer. This is a useful structure for 
fanction approximation (or regression) problems. As an initial guess,， we Use 
和 ve neurongs in the hidden layer. The network should have three output 
neurons Since there are three targets. We will use the Levenberg-Marquardt 
algorithm for training， 


net = newff(minmax(ptr),[5 3],{' tansig purelin'}，trainlm' ) ; 
[net,tr]=train(net,ptrittr,[],[],Vval,test) ; 

TRAINLM，Epoch 0/100，MSE 3.11023/0，Gradient 804.959/1e-10 
TRAINLM，Epoch 15/100，MSE 0.330295/0，Gradient 104.219/1e-10 
TRAINLM，Validation stop， 


The training stopped after 15 iterations becausethe validation error increased. 
It is a useful diagnostic tool to plot the training, validation and test errors to 
check the progress of training. We can do that with the following commands. 


plot(tr.epoch,tr.perf,tr.epoch,tr.vperf,tr.epochtr.tperf) 
1egend(' Training ，Validation' ，Test ，-1) 
ylabel( ' Squared Error '); xlabel('` Epoch ) 


The result is shown in the following figure. The result here is reasonable, since 
the test set error and the validation set error have Similar characteristics, and 
it doesnt appear that any significant overfitting has occurred. 
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The next step is to perform some analysis ofthe network response. We will put 
the entire data set through the network (training, validation and test) and w 计 
perform a linear regression between the network outputs and the 
corresponding targets. First we need to unnormalize the network outputs. 


an = Slim(net,ptrans) ; 
a = poststd(an,meant stdt) ; 
for 1I=1:3 
figure(I) 
[m(i),b(i),r(i)] = postreg(a(iy,:)，t(iy:)); 
end 


In this case, we have three outputs, so we perform three regressions. The 
results are shown in the following figures， 


Sample Training Session 
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400 ， ， ， | 





350 上 时 





O 〇 O Data Points 
A=T 
300 上 时 Best Linear Fit 


























-50 ， 
0 50 10 150 200 250 30 350 400 
了 


Best Linear Fit: A = (0.753)T+ (31.7) 
350 r r * r r 








300 上 ] O 〇 O Data Points 
A= 工 
Best Linear Fit 








250 


200 


150 


100 


50 














0 50 100 150 200 250 300 350 
基 


5-69 


5 Bockpropagation 








Best Linear Fit: A = (0.346) T + (28.3) 
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The first two outputs seem to track the targets reasonably well (this is a 
difEcult problem), and the R-values are almost 0.9. The third output (vldl 
levels) is not well modeled. We probably need to work more on that problem. 
We might go on to try other network architectures (more hidden layer 
neurons), or to try Bayesian regularization instead of early stopping for our 
trainingtechnique. Ofcourse there is also the possibilitythat vldl levels cannot 
be accurately computed based on the given spectral components. 


The function demobp1 contaings a Slide show demonstration of the sample 
training session. The function nnsample1 contains all ofthe commands that we 
used in this section. You can useit as atemplate for your own training sessions， 
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Limifafions and Caufions 


The gradient descent algorithm is generally very slow because it requires small 
learning rates for stable learning. The momentum variation is usually faster 
than Simple gradient descent, since it allows higher learning rates while 
maintaining stability, but it is still too slow for many practical applications. 
These two methods would normally be used only when incremental training is 
desired. You would normally use Levenberg-Marquardt training for small and 
medium size networks, 这 you have enough memory available. Imemory is a 
problem, then there are a variety of other fast algorithms available. For large 
networks you will probably want to use trainscg or trainrp. 


Multi-layered networks are capable of performing just about any linear or 
nonlinear computation, and can approximate any reasonable function 
arbitrarily well. Such networks overcome the problems associated with the 
perceptron and linear networks. However, while the network being trained 
may be theoretically capable of performing correctly, backpropagation and its 
variations may not always find a solution. See page 12-8 of [HDB96] for a 
discussion of convergence to local minimum points. 


Picking the learning rate for anonlinear network is achallenge. As with linear 
networks, a learning rate that is too large leads to unstable learning. 
Conversely, a learning rate that is too small results in incredibly long training 
times. Unlike linear networks, there is no easy way of picking a good learning 
rate for nonlinear multilayer networks.See page 12-8 of IHDB96] for examples 
of choosing the learning rate. With the faster training algorithms, the default 
parameter values normally perform adequately. 


The error Surface of a nonlinear network is more complex than the error 
Surface of a linear network. To understand this complexity see the figures on 
pages 12-5 to 12-7 of [IHDB96], which shovw three different error surfaces for a 
multilayer network. The problem is that nonlinear transfer fanctions in 
multilayer networks introduce many local minima in the erTror Surface. Ags 
gradient descent is performed on the error surface it is possible for the network 
solution to become trapped in one of these local minima. This may happen 
depending on the initial starting conditions. Settling in a local minimum may 
be good or bad depending on how close the local minimum is to the global 
minimum and how low an error is required. In any case, be cautioned that 
although a multilayer backpropagation network with enough neurons camn 
implement just about any fanction, backpropagation will not always find the 
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correct weights for the optimum solution. You may want to reinitialize the 
network andretrain several times to guaranteethat youhave the best solution. 


Networks are also sengsitive to the number ofneurongs in their hiddqen layers. 
Too few neurons can lead to underfitting. Too many neurons can contribute to 
overfitting, in which all training points are well fit, but the fitting curve takes 
wild oscillations between these points. Ways of dealing with various of these 
issues are discussed in the section on improving generalization. This topic is 
also discussed starting on page 11-21 of IHDB96]. 


Summary 





Summary 


Backpropagation can train multilayer feed-forward networks with 
differentiable transfer fanctiongs to perform function approximation, pattern 
association, and pattern classification. (Other types ofnetworks can be trained 
as well, although the multilayer network is most commonly used.) The term 
backpropagation refers to the process by which derivatives ofnetwork erTror， 
with respect to network weights and biases, can be computed. This process cam 
be used with anumber of different optimization Strategies. 


The architecture of a multilayer network is not completely constrained by the 
problem to be solved. The number of inputs to the network is constrained by 
the problem, and the number of neurons in the output layer is constrained by 
thenumber ofoutputs required bythe problem. However, the number oflayers 
between network inputs and the output layer and the sizes ofthe layers are up 
to the designer. 


The two-layer sigmoid/linear network can represent any fanctional 
relationship between inputs and outputs ifthe sigmoid layer has enough 
neurons. 


There are Several different backpropagation training algorithms. Theyhave a 
variety of qifferent computation and storage requirements, and no one 
algorithm is best suited to all locations. The following list summarizes the 
training algorithms included in the toolbox. 





Function Description 





traingd Basic gradient descent. Slow response, can be used in 
incremental mode training. 


traingdm Gradient descent with momentum. Generally faster than 
traingd. Can be used in incremental mode training. 


traingdx Adaptive learning rate. Faster training than traingd, but 
can only be used in batch mode training. 


trainrp Resilient backpropagation. Simple batch mode training 
algorithm with fast convergence and minimal storage 
requlirements. 
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Funcfion 


Descripfion 





traincgf 


traincgp 


traincgb 


trainscdg 


trainbfg 


trainoss 


trainlm 


trainbr 


Fletcher-Reeves conjugate gradient algorithm. Has 
smallest storage requirements of the conjugate gradient 
algorithms. 


Polak-Ribikre conjugate gradient algorithm. Slightly larger 
storage requirements than traincgf. Faster convergence 
on Some problems. 


Powell-Beale conjugate gradient algorithm. Slightly larger 
storage requirements than traincgp. Generally faster 
convergence. 


Scaled conjugate gradient algorithm. The only conjugate 
gradient algorithm that requires no line search. A very 
good general purpose training algorithm. 


BEFGS quasi-Newton method. Requires storage of 
approximate Hessian matrix and has more computation in 
each iteration than conjugate gradient algorithms, but 
usually converges in fewer iterations. 


One step secant method. Compromise between conjugate 
gradient methods and quasi-Newton methods. 


Levenberg-Marquardt algorithm. Fastest training 
algorithm for networks of moderate size. Has memory 
reduction feature for use when the training set is large. 


Bayesian regularization. Modification ofthe 
Levenberg-Marquardt training algorithm to produce 
networks that generalize well. Reduces the difculty of 
determining the optimum network architecture. 





One problem that can occur when training neural networks is that the network 
can overfit on the training set and not generalize well to new data outside the 
training set. This can be prevented by training with trainbr, butit can alsobe 
prevented by using eary stoppips with any ofthe other training routines. This 


requires that the user pass a validation set to the training algorithm, in 
addition to the standard training set. 


Summary 





To produce the most efcient training,it is often helpful to preprocess the data 
before training. Itis alsohelpfulto analyze thenetwork response after training 
is complete. The toolbox contains anumber of routines for pre- and 
post-processing. They are summarized in the following table. 





Function Description 





premnmx Normalize data to fall in the range [-1,1]. 


postmnmx ， Inverse of premnmx. Used to convert data back to standard 
Unaits. 


tramnmx Normalize data using previously computed minimums and 
maximums. Used to preprocess new inputs to networks that 
have been trained with data normalized with premnmx. 


prestd Normalize data to have zero mean and unity standard 
deviation . 

poststd Inverse of prestd. Used to convert data back to standard 
Unaits. 

trastd Normalize data using previously computed means and 


standard deviations. Used to preprocess new inputs to 
networks that have been trained with data normalized with 
prestd. 


prepca Principal component analysis. Reduces dimension of input 
vector and un-correlates components of input vectors. 


trapca Preprocess data using previously computed principal 
component transformation matrix. Used to preprocess new 
inputs to networks that have been trained with data 
transformed with prepca. 


postreg Linear regression between network outputs and targets. 
Used to determine the adequacy of network fit. 
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Inftroduction 


Neural networks have been applied very successfully in the identification and 
control of dynamic systems. The universal approximation capabilities of the 
multilayer perceptron makeitapopular choice for modeling nonlinear systems 
and for implementing general-purpose nonlinear controllers [HaDe99]. This 
chapter introduces three popular neural network architectures for prediction 
and control that have been implemented in the Neural Network Toolbox: 


e Model Predictive Control 


e NARMA-L2 (or Feedback Linearization) Control 
e Model Reference Control 


This chapter presents brief descriptions of each of these architectures and 
demonstrates how you can use them. 


There are typically two steps involved when using neural networks for control: 
1 System Identification 


2 Control Design 


In the system identification stage, you develop a neural network model of the 
plant that you want to control. In the control design stage, you use the neural 
network plant model to design (or train) the controller. In each of the three 
control architectures described in this chapter, the system identification stage 
is identical. The control design stage, however, is different for each 
architecture. 


e For the model predictive control, the plant model is used to predict future 
behavior ofthe plant, and an optimization algorithm is used to select the 
control input that optimizes future performance. 

e For the NARMA-L2 control, the controller is simply a rearrangement of the 
plant model. 

e For the model reference control, the controller is a neural network that is 


trained to control a plant so that it follows a reference model. The neural 
network plant model is used to assist in the controller training、. 


The next three sections of this chapter discuss model predictive control， 
NARMA-L2 control and model reference control. Each section consists of a brief 
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description ofthe control concept, followed by a demonstration ofthe use ofthe 
appropriate Neural Network Toolbox function. These three controllers are 
imnplemented as Simulink blocks, which are contained in the Neural Network 
Toolbox blockset. 


To assist you in determining the best controller for your application, the 
following list summarizes the key controller features. Each controller has its 
own Strengths and weaknesses. No single controller is appropriate for every 
application. 


Model Predictive Control. This controller uses a neural network model to 
predict future plant responses to potential control sijgnals. An optimization 
algorithm then computes the control signals that optimize future plant 
performance. The neural network plant model is trained offline, in batch 
form, using any ofthe training algorithms discussed in Chapter 5. (This is 
true for all three control architectures.) The controller, however, requires a 
significant amount of on-line computation, since an optimization algorithm 
is performed at each sample time to compute the optimal control input. 


NARMA-L2 Control. This controller requires the least computation of the 
three architectures described in this chapter. The controller is simply a 
rearrangement ofthe neural network plant model, which is trained offline， 
in batch form. The only online computation is a forward pass through the 
neural network controller. The drawback ofthis method is thatthe plant 
must either be in companion form, or be capable of approximation by a 
companion form model. (The companion form model is described later in this 
chapter.) 


Model Reference Control. The online computation of this controller, like 
NARMA-L2, is minimal. However, unlike NARMA-L2, the model reference 
architecture requires that a separate neural network controller be trained 
off-line, in addition to the neural network plant model. The controller 
training is computationally exzpengsive, since it requires the use of dynamic 
backpropagation [HaJe99]. On the positive side, model reference control 
applies to a larger class of plant than does NARMA-L2 control. 
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NN Predictive Control 


The neural network predictive controller that is imnplemented in the Neural 
Network Toolbox uses a neural network model of a nonlinear plant to predict 
future plant performance. The controller then calculates the control input that 
will optimize plant performance over a specified future time horizon. The first 
step in model predictive control is to determine the neural network plant model 
(System identification). Next, the plant model is used by the controller to 
predict future performance. (See the Moael Predzictive CoPtyrol Too1box Users 
CGxiae for a complete coverage of the application of various model predictive 
control strategies to linear Systems.) 


The following section describes the system identification process. This is 
followed by a description of the optimization process. Finally, it discusses how 
to use the model predictive controller block that has been implemented in 
Simulink. 


System Identification 


The first stage of model predictive control is to train a neural network to 
represent the forward dynamics ofthe plant. The prediction erTror between the 
plant output and the neural network output is used as the neuTral network 
training signal. The process is represented by the following figure. 
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The neural network plant model uses previous inputs and previous plant 
outputs to predict future values ofthe plant output. The structure ofthe neural 
network plant model is given in the following figure， 
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This network can be trained ofline in batch mode, using data collected 位 om 
the operation ofthe plant. Any ofthe training algorithms discussed in Chapter 
5 can be used for network training. This process is discussed in more detail 
later in this chapter. 
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The model predictive control method is based on the receding horizon 
technique [SoHa96]. The neural network model predicts the plant response 
over a specified time horizon. The predictions are used by a numerical 
optimization program to determine the control signal that minimizes the 
following performance criterion over the specified horizon. 


TV: Nu 
o = 六 OrGE+ 放 -ym( 人 t+ 放 )2+P 》， (GE+J 一 1 一 友人 + 一 2))2 
J = J = 1 


where Ni, Ne and N, define the horizons over which the tracking error and 
the control increments are evaluated. The x' variable is the tentative control 
signal, y， is the desired response and y，is the network model response. The 
pP value determines the contribution that the sum ofthe squares ofthe control 
increments has on the performance index. 


The following block diagram illustrates the model predictive control proceSsS. 
The controller consists of the neural network plant model andthe optimization 
block. The optimization block determines thevalues of x thatminimize ,and 
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then the optimal x is input to the plant. The controller block has been 
implemented in Simulink, as described in the following section , 
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Using fthe NN Predictive Conftroller Block 


This section demonstrates how the NN Predictive Controller block is used.The 
first step is to copy the NN Predictive Controller block from the Neural 
Network Toolbox blockset to your model window. See your Simulink 
documentation 过 you are not sure how to do this. This step is skipped in the 
following qdemonstration , 


Ademo modelis provided with the Neural Network Toolbox to demonstrate the 
predictive controller. This demo uses a catalytic Continuous Stirred Tank 
Reactor (CSTR). A diagram of the process is shown in the following figure. 
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where 见 (1) is the liquid level, Co(b i8 theproduct concentration at the output 
of the process, w1 (is the flow rate ofthe concentrated feed Ci, and mo( 轨 
is the flow rate of the diluted feed Cop. The input concentrationgs are set to 
Coal = 24.9 and Co = 0.1. The constants associated with the rate of 
consumption are RE1 = 1 and PR。 = 工 . 


The objective of the controller is to maintain the product concentration by 
adjusting the flow wa( 加 .To simplify the demonstration, we set wj1( 轨 = 0.1. 
The level ofthe tank 7.( 引 is not controlled for this experiment， 


To run this demo, follovw these steps. 
1 Start MATLAB. 


2 Run the demo model by typing predcstr in the MATLABGQO command 
window. This command starts Simulink and creates the following model 
window. The NN Predictive Controller block has already been placed in the 
model. 
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This NN Predictive Controller block was copied from fhe Neura| Nefwork Toolbox 
blockset fo this model window. The Control Signal wos connected fo the input of 
ihe plant model. The outpuf of fhe plont model was connected fo Planf Outpuf. This block contains fhe Simulink 
The reference Signal was connected fo Reference. CSTR plant model. 
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3 Double-click on the NN Predictive Controller block. This brings up the 
following window for designingthe model predictive controller. This window 
enables you to change the controller horizons Vo and N,. CNVi is fixed at1.) 
The weighting parameter p , described earlier, is also defined in thig 
window. The parameter Q is used to control the optimization. It determines 
how much reduction in performance is required for a successful optimization 
step. You can Select which linear minimization routine is used by the 
optimization algorithm, and you can decide how many iterations of the 
optimization algorithm are performed at each sample time. The linear 
minimization routines are slight modifications ofthose discussed in 
Chapter 5. 
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4 Select Plant Idqentification. This opens the following window. The neural 
network plant model must be developed before the controller is used. The 
plant model predicts future plant outputs. The optimization algorithm uses 
these predictions to determine the control inputs that optimize future 
performance. The plant model neural network has one hidden layer,， as 
shown earlier. The size ofthat layer, the number of delayed inputs and 
delayed outputs, and the training function are selected in this window. You 
can Select any of the training functions described in Chapter 5 to train the 
neural network plant model. 
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5 Select the Generate Training Data button. The program generatesgs 
training data by applying a series ofrandom step inputs to the Simulink 
plant model. The potentialtraining data is then displayedin afigure similar 
to the following. 
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6 Select Accept Data, and then select Train Network 位 om the Plant 
Identification window. Plant model training begins. The training proceeds 
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according to the selected training algorithm (trainlmin this case). This is a 
straightforward application ofbatch training, as described in Chapter 5. 
After the training is complete, the response of the resulting plant model is 
displayed, as in the following figure. (There are also separate plots for 
validation and testing data, 坟 they exist.) You can then continue training 
with the same data set by selecting Train Network again, you can 了 rase 
Generated Data and generate anew data set, or you can accept the current 
plant model and begin simulating the closed loop system. For this 
demonstration, begin the simulation, as shown in the following steps. 
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7 Select OK in the Plant Identification window. This loadqs the trained neural 
network plant model into the NN Predictive Controller block. 


8 Select OK in the Neural Network Predictive Control window. This loads the 
controller parameters into the NN Predictive Controller block. 


9 _ Return to the Simulink model and start the simulation by choosing the 
Start command from the Simulation menu. As the Simulation runs, the 
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plant output and the reference signal are displayed, as in the following 
在 gure. 


XY Graph 
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NARMA-L2 (Feedback Linearizaftion) Control 
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The neurocontroller described in this section is referred to by two different 
names: feedback linearization control and NARMA-L2 control. It is referred to 
as feedback linearization when the plant model has a particular form 
(companion form). It is referred to as NARMA-L2 control when the plant model 
can be approximated by the same form. The central idea of this type of control 
is to transform nonlinear System dynamics into linear dynamics by canceling 
the nonlinearities. This section begins by presenting the companion form 
system model and demonstrating how youcan use aneural network to identify 
this model. Then it describes hovw the identified neural network model can be 
used to develop a controller. This is followed by a demonstration of how to use 
the NARMA-L2 Control block, which is contained in the Neural Network 
Toolbox blockset. 


Identification of ftfhe NARMA-L2 Model 


As with the model predictive control, the first step in using feedback 
linearization (or NARMA-L2 control) is to identify the system to be controlled. 
You train a neural network to represent the forward dynamics of the system. 
The first step is to choose a model structure to use. One standard model that 
has been used to represent general discrete-time nonlinear Systems is the 
Nonlinear Autoregressive-Moving Average (NARMA) model: 


y(E+Q) = NIy(R),y(R-1).,y( 有 -7+1), RE) CR--1)， ,2( 有 一 7+1I)] 


where (PE) is the system input, and y(R) is the system output. For the 
identification phase, you could train a neural network to approximate the 
nonlinear function NW.This is the identification procedure used for the NN 
Predictive Controller. 


HIyou want the system output to follow some reference trajectory， 
y(E+d) = yr(R+d),the next step is to develop anonlinear controller of the 
form: 


&(E) = G[y(R),y( 尼 一 1) .7y( 有 -+1)y(R+Q), (一 1) ,2( 有 -+1I)] 


The problem with using this controller is that 过 you want to train a neural 
network to create the function G that will minimize mean square error, yoOU 
need to use dynamic backpropagation ([INaPa91l] or [HaJe99]). This can be 
quite slow. One solution proposed by Narendra and Mukhopadhyay [NaMu97] 
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is to use approximate models to represent the system. The controller used in 
this section is based on the NARMA-L2 approximate model: 


y(E+a) = FIy(B),y(R-1),.,y(R-7+1),x(RE-1) ,RE-7+1)] 
+S[y(P),y(R-1), .yy(R- 有 +1)R-1) ,RER-7+T1I1)] (RE) 


This model is in companion form,， where the next controller input (RE) is not 
contained inside the nonlinearity. The advantage of this form is that you can 
solve for the control inputthat causes the system output to follow the reference 


y(E+d) = y (+Qw).Theresulting controller would have the form 
yyr(E+d)- 帮 y(R),y(R 一 1 .yy( 尼 一 有 二 1) 2( 玉 一 1 ,2( 玉 一 有 十 1)] 
人 


Using this equation directly can cause realization problems, because you must 
determine the control input (RE) based on the output at the same time, y(R) . 
So, instead, use the model 


y(E+Q) = FIy(R),y(R-1),.y(R- 有 +1) RE) ,CR-1T) (RE--7+1I)] 
+S[y(P),， .yy( 尺 一 有 +1), (CR)， ,VC(R- 有 +1)] .CR+1TI) 


where w>2. The following fgure shows the structure of a neural network 
representation. 
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Neural Network Approximation of g() 
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Neural Network Approximation of F) 


NARNMA-L2 Controller 
Using the NARMA-L2 model, you can obtain the controller 


yyr(E+d)- 帮 y(E) .,y(R 一 于 二 1 (RD)， ,2(R 一 有 +1I)] 


人 5 YI VOIDT 


which is realizable for g> 2. The following figure is a block diagram of the 
NARMA-L2 controller. 
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This controller can be imnplemented with the previously identified NARMA-L2 
plant model, as shown in the following figure. 
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Neural Network Approximation of g() 











| 蔬 吕 口 
g 









已 总 癌 





中 
@ 


(IT) 





Eee= 





y(+]T) 












已 蝗 蝇 
@ 
站 





Neural Network Approximation of F) 


Using fthe NARMA-L2 Controller Block 


This section demonstrates how the NARMA-L2 controller is trained.The first 
step is to copy the NARMA-L2 Controller block from the Neural Network 
Toolbox blockset to your model window. See your Simulink documentation 让 
you are not Sure hovw to do this. This step is skipped in the following 
demonstration. 


Ademo model is provided with the Neural Network Toolbox to demonstrate the 
NARMA-L2 controller. In this demo, the objective is to control the position of a 
magnet sSuspended above an electromagnet,， where the magnet is constrained 
so that it can only move in the vertical direction, as in the following gure. 
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The equation of motion for this system is 


dy -这 DBdy 伯 
Ci My MI dt 


where y(t) is the distance ofthe magnet above the electromagnet, I(t) is the 
current flowing in the electromagnet, MM is the mass ofthe magnet, and S is 
the gravitational constant. The parameter B is a viscous friction coefficient 
that is determined by the material in which the magnet moves, and oa is a field 
strength constant that is determined by the number ofturns of wire on the 
electromagnet and the strength of the magnet. 


To run this demo, follow these steps. 
1 Start MATLAB. 


2 Run the demo model by typing narmamaglev in the MATLAB command 
window. This command starts Simulink and creates the following model 
window. The NARMA-L2 Control block has already been placed in the 
model. 
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Fa [TI Liz 


3 Double-click on the NARMA-L2 Controller block. This brings up the 
following window. Notice that this window enables you to train the 
NARMA-L2 model. There is no separate window for the controller, since the 
controller is determined directly from the model, unlike the model predictive 


controller. 
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4 Since this window works the same as the other Plant Identification 
windows, we won't go through the training process again now. Instead, let's 


Simulate the NARMA-L2 controller. 


5 _ Return to the Simulink model and start the Simulation by choosing the 
Start command from the Simulation menu. As the simulation runs, the 
plant output and the reference signal are displayed, as in the following 


和 gure， 
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Model Reference Contirol 


The neural model reference control architecture usegs two neural networks: a 
controller network and a plant model network, as Shown in the following 

和 gure. The plant model is identified first, and then the controller is trained so 
that the plant output follows the reference model output. 
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The figure on the following page shows the details of the neural network plant 
model and the neural network controller, as they are implemented in the 
Neural Network Toolbox. Each network has two layers, and you can select the 
number ofneurons to use in the hidden layers. There are three sets of 
controller inputs: 

e Delayed reference inputs 

e Delayed controller outputs 


e Delayed plant outputs 


For each of these inputs, you can Select the number of delayed values to use. 
Typically, the number ofdqelays increases with the order ofthe plant. There are 
two sets of inputs to the neural network plant model: 


e Delayed controller outputs 
se Delayed plant outputs 


As with the controller, you can set the number of delays. The next section 
demonstrates how you can set the parameters. 
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Using the Model Reference Controller Block 


This section demonstrates how the neural network controller is trained. The 
first step is to copythe Model Reference Control block from the Neural Network 
Toolbox blockset to your model window. See your Simulink documentation 让 
you are not Sure hovw to do this. This step is skipped in the following 
demonstration . 


A demo model is provided with the Neural Network Toolboxto demonstrate the 
model reference controller. In this demo, the objective is to control the 
movement of a simple, single-link robot arm, as shown in the following fgure. 





The equation of motion for the arm is 


2 
一 = 一 10sin-2 
dt 


d 
7 


- 合 - 


where b is the angle ofthe arm, and x is the torque Supplied bythe DC motor. 


The objective is to train the controller so that the arm tracks the reference 
model 





where y， is the output of the reference model, and 7 is the input reference 
Signal. 
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This demo uses a neural network controller with a 5-13-1 architecture. The 
inputs to the controller consist of two delayed reference inputs, two delayed 
plant outputs, and one delayed controller output. A sampling interval of 0.05 
Seconds is used. 


To run this demo, follow these steps. 


1 Start MATLAB. 


2 Run the demo model bytyping mrefrobotarm in the MATLAB command 
window. This command starts Simulink and creates the following model 
window. The Model Reference Control block has already been placed in the 
model. 
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3 Double-click on the Model Reference Control block. This brings up the 
following window for training the model reference controller. 
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Model Reference Control  _ 上 口 | x| 
File window Help 


This block specifies 
the inpufs fo fhe 











The file menu hos severdl 
items, including ones thot 
dllow you to import ond 

export controller ond plont 
networks. 


Model Reference Control controller 


Network 上 rchitecture 


Size of Hidden Layer | 13 No. Delayed Reference Inputs | 2 
You must specifyo 
Sampling |nterval [sec] | 0.05 No. Delayed Controller Dutputs | 1 Simulink | 


屎 Normalize Training Data No.DelayedPlantDutpus [六 2 model forihe plont 
to follow. 





The parometers in this block Training Data 





specify the rundom Maximum Reference Value | 07 Controller Training 5amples 200 
reference input foriraining. 六 三 
The reference is o series of 介 siroining dolo 
rundom steps df rundom Maximum Interval Value [sec] | 2 Reference Modet Browse | did 1 
infervals. Minimum IntervalValue [sec] ‖ 01 Tobot_ref segments， Specify 
Generate Training Data | Import Data | Export Data | the number 叶 
= 一 trgining epochs for 


You must generate or Training Parameters 


import troining doto Controller Training Epoch | ini ‖ 
gEpochs 10 ontroller Training Segments 2 
before you con train the 


controller 


edch Segment. 


JY Use Current weights 厂 Use Cumulative Training fseleced 
了 了 


Plant Identification | 丰 | Cancel | 示 pH segmenis of dai 
Perform plant identification before coni are 0dded to ihe 


trdining Sef 0qs 























7 
Current weights ore used trdining Continues. 
as inifial condifions to Otherwise, only one 
coniinue training. This bufton opens ihe Plom After ihe controller hos been segmentotoafimeis 
1dentificotion window. The plont troined select OK or Apply to Used. 
must be ideniified before fhe |oadihe nefwork inio the Simulink 


Controller is trained. mode|. 








4 The next step would normally be to select Plant Identification,， which 
opens the Plant Identification window. You would then train the plant 
model. Since the Plant Identification window is identical to the one used 
with the previous controllers, we won't go through that process here. 
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5 Select Generate Data. The program then starts generating the data for 
training the controller. After the data is generated, the following window 
appears. 









Plant Input-O0utput Data for NN Model Reference Control -上 口 | x| 


Reference Model Input 


0 可 斗 6 8 10 


Reference Model Output 








点 ccept Data Refuse Data 














上 fthe doto is nof odequate, seled 
this bufton andthen go back to ihe 
Coniroller window and seled 
Generate Data 0guin. 


Select this 计 ihe training dota 
shows enough variation fo 
adequately train the controller. 








6 Select Accept Data. Return to the Model Reference Control window and 
select Train Controller. The program presents one segment of data to the 
network and traings the network for a specified number of iterations (five in 
this case). This process continues one Segment at a time until the entire 
training set has been presented to the network. Controller training can be 
significantly more 妇 me consuming than plant model training. This is 
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because the controller must be trained using aymzamzzc backpropagation (see 
[HaJe99]). After the training is complete, the response of the resulting 
closed loop System is displayed, as in the following figure. 


Plant Response for NN Model Reference Control | _ 上 口 | x| 
File Edit ITools Window Help 


Reference Miodel Input 











This axis disploys the 
0.5 
random reference inp 由 
thof was used for fraining. 
-0.5 
-1 
0 沁 站 5 8 10 













Reference Miodel Dutput 人 blue), Plant Output (greem) 





This axis disploys the 
response of the reference 
model and ifhe response of 
the closed loop plont. The 
plont response should 
follow fhe reference 
mode|. 





7 Go back to the Model Reference Control window. Ithe performance of the 
controller is not accurate, then you can select Train Controller again， 
which continues the controller training with the same data set. Ifyou would 
like to use anevw data set to continue training, the select Generate Data or 
Import Data before you select Train Controller. (Be sure that Use 
Current Weights is selected, 这 you want to continue training with the 
same weights.) It may also be necessary to retrain the plant model. Ifthe 
plant model is not accurate, it can affect the controller training. For thigs 
demonstration, the controller should be accurate enough, so select OK. This 
loads the controller weights into the Simulink model. 
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8 Return to the Simulink model and start the Simulation by selecting the 
Start command from the Simulation menu. As the simulation runs, the 
plant output and the reference signal are displayed, as in the following 
和 gure， 


XY Graph IDIX 
File Edit Tools window Help 
| 上 口 区 加 全 | AAA | 用 只 半 
基站 Plot 





1 
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Importing and Exporting 


You can Save networks and training data to the workspace orto adisk file. The 
following two sections demonstrate how you can do this, 


Importing and Exporfing Netfworks 


The controller and plant model networks that you develop are stored within 
Simulink controller blocks. At some point you may want to transfer the 
networks into other applications, or you may want to transfer a network 位 om 
one controller block to another. You can do this by using the Import Network 
and Export Network menu commands. The following demonstration leads you 
through the export and import processes. (We use the NARMA-L2 window for 
this demonstration, but the same procedure applies to all of the controllers.) 


1 Repeat the first three steps ofthe NARMA-L2 demonstration. The 
NARMA-L2 Plant Identification window should then be open. 


2 Select Export from the File menu, as shown below， 


Plant Identification - NARMA-L2 


File window Help 


Import Network..， Ctrl+| 
TPRRTTTGE mtific: 













5ave Ctrl+5 


Save andE%L Ci|+ 上 





Exit without Saving Ctrl+ 


This causes the following window to open. 
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/ Here you cun seled 


Which variables or 
networks will be 


\ exporied. ) 
/ Here you cn choose 


nomes for the network 
obiedts. 





Export Neural Network Plant-Controller Para. 加 


Selecl 













You can save ihe 
nefworks os nefwork 
objects, or as weights 
ond biases. 











全 上 wariables 


个 Neural Network Contallerwieiohts | netn_contr 
个 Neural Network Plant weights | netn_plant 


JY Use Neural Network Dbiect Definition 



























You can Send ihe 
netfworks fo disk, or You con dlso sqve ihe 


fo he workspace. Exsportto Disk | Exsportto Simulink nefworks os Simulink 
Export to Workspace | Help | Cancel | models. 










3 Select Export to Disk. The following window opens. Enter the flename 
test in the box, and select Save. This saves the controller and plant 
networks to disk. 


5aE 交 | 包 temp = 了 | 画 | 加 性 | | 陵 轩 









Thefilename goes 
here. 


File name: [est 
Save as type: [MAT -files [ma 属 | Cancel | 
静 


4 Retrieve that data with the Imnport menu command. Select Import 
Network from the File menu, as in the following figure. 
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Plant Identification - NARM 
File window Help 


Import 凡 etwork..， “Ctrl+| 


Export Network..， Ctrl+E 





Sawe Ctrl+5 
Savwe and ES Ctrl+ 饭 





This causes the following window to appear. Follow the steps indicated on 
the following page to retrieve the data that you previously exported. Once 
the data is retrieved, you can load it into the controller block by selecting OK 
or Apply. Notice that the window only has an entry for the plant model， 
even though you saved both the plant model and the controller. This is 
because the NARMA-L2 controller is derived directly from the plant model]， 
So you don't need to import both networks. 
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Import Neural Network Plant Parameters -上 口 | x| 











Import Fror: MAT-file Contents Neural Network Models 


六 Workspace 
会 MAT-fe 
MAT-fle name: 


Look im: [ 白 temp 司 男 | 性 | 法 王 | 















Select MAT-file and 
select Browse. 






Available NAT-files will 
dppear here. Seleci ihe 
gppropriate file; then seled 
0pen. 





File name: jestmat 
Files of tnpe': |MaT-fles 人 ma 下 | 





The available nefworks 
appedr here. 


Import Neural Network Plant Parameters  - 口 | x| 








MAT-fle Contents 


metn_plant 


Neural Network Models 






人 六 Workspace 







Select the oppropriote plant 
and/or controller and move 
ihem into the desired 
position and Select OK. 


人 MaT-fe 
MAT-flle name: 


‖ CAMAaTLaABR11Atempytest 
Browse | 








Plant 


| metn_plant 







Help | DK | Cancel 
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Importing and Exporting Training Dafta 

The data that you generate to train networks exists only in the corresponding 
plant identification or controller training window. You may wish to save the 
training data to the workspace or to a disk file so that you can load it again at 
a later time. You may also want to combine data sets manually and then load 
them back into the training window. You can do this by using the Import and 
卫 xport buttons. The following demonstration leadqs youthrough the import and 
export processes. (We use the NN Predictive Control window for this 
demonstration, but the same procedure applies to all of the controllers.) 


1 Repeatthe first five steps oftheNN Predictive Control demonstration. Then 
select Accept Data. The Plant Identification window should then be open， 
and the Import and 卫 xport buttons should be active. 


2 Select the Export button. This causes the following window to open. 















Export Data- | X| 


Select 
Data Structure Name: | tr_dalt di leost fwo field: 
You can export ihe name. 山 ond name.Y. 


dofo fo the Workspace Export to Disk | Cancel | These fwo field 
orio o disk fle. 


Contoin the input and 


Export to Workspace | Help | output arrays. 


You cun Select o nome 
for the dota structure. 
The siructure contoins 




















3 Select Export to Disk. The following window opens. Enter the flename 
testdat in the box, and select Save. This saves the training data Structure 
to disk. 
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55E 交 | 所 temp 了 | 男 | 性 | 侍 量 


test.malt 







Thefilename goes 
here. 













File name: testdal 


Save as type: |MAT+fles 攻 mall =] Cancel | 
元 





4 Novw let's retrieve the data with the import command. Select the Import 
button in the Plant Identification window.This causes the following window 
to appear. Follow the steps indicated on the following page to retrieve the 
data that you previously exported. Once the data is imported, you can train 
the neural network plant model. 
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Import Data- | -| 口 ] | 


































Type of Data: MATrfle Contents Input Dutput Variables 
人 

Structures Structure Name: 
六 上 Trans ‖ 





Import From': 
Input 癌 rray 
六 Workspace 广 一 
他 沁 
MAT-fie Dutputarray: 






MATr-file name: 


Cancel | 


[| 自 emp 了 | 辐 | 即 | 侍 量 | 












Select MAT-file and 
Select Browse. 





Available NAT-files will 
appedr here. Select ihe 
appropriate file; then seledt 
0pen. 





File name: |iestdatmat 
Files of type: |MAT+fles [ma 了] 


The available dato oppears 
here. 





Import Data- -| 口 | x| 
pe of Dalta: MAT-fie Contents 









Input Dutput wariables 





The dato con be imported os two 


人 Structures tr dal 匡 
drrays (inpuf and outpuft) or os Structure Name: 
Structure thof contoins of leostfwo 人 Arays [dat 
fields: name.U ond name.Y. Import From: [ 有 





人 Wiorkspace 
人 MG 


r- 一 一 


Dutput 总 rray 












MAT-file name: 


‖ CATLABR11Atempstesh 
Browse | 


Select ihe oppropriate dota 
Structure or orruy dund move 


i 计 info the desired posifion 
and select OK. 
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Summary 


The following table summarizes the controllers discussed in this chapter. 





Block 


Descripiion 





NN Predictive Control 


NARMA-L2 Control 


Model Reference 
Control 


Uses aneural network plant model to predict 
future plant behavior. An optimization 
algorithm determines the control input that 
optimizes plant performance over a finite time 
horizon. The plant training requires only a 
batch algorithm for static networks and is 
reasonably fast. The controller requires an 
online optimization algorithm, which requires 
more computation than the other controllers. 


An approximate plant model is in companion 
form. The next control input is computed to force 
the plant output to follow a reference signal. The 
neural network plant model is trained with 
static backpropagation and is reasonably fast. 
The controller is arearrangement ofthe plant 
model, and requires minimal online 
computation. 


Aneural network plant model is first developed. 
The plant model is then used to train aneural 
network controller to force the plant output to 
follovw the output of a reference model. This 
control architecture requires the use of dynamic 
backpropagation for training the controller This 
generally takes more time than training static 
networks with the standard backpropagation 
algorithm. However this approach applies to a 
Imore general class of plant than does the 
NARMA-L2 control architecture. The controller 
requires minimal online computation. 
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Inftroduction 


Radial basis networks may require more neurons than standard feed-forward 
backpropagation networks, but often they can be designed in a fraction ofthe 
time it takes to train standard feed-forward networks. They work best when 
many training Vectors are available. 


You may want to consult the following paper on this subject: 


Chen,S.,C.F.N.Cowan, andP. M. Grant “Orthogonal LeastSquares Learning 
Algorithm for Radial Basis Function Networks,”7 五 匹 忆 T7a712sactzo1s 07 VeraQL 
ANetzorRs, vol. 2, no. 2, March 1991, pp. 302-309. 


This chapter discusses two variants of radial basis networks, Generalized 
Regression networks (GRNN) and Probabilistic neural networks (PNN). You 
may want to read about them in P.D. Wasserman, Aquarmcea 1MMetjoas 
Mervral Co1pVtSs, New York: Van Nostrand Reinholdq, 1993 on pp. 155-61， 
and pp. 35-55 respectively. 


Important Radial Basis Funcftions 
Radial basis networks can be designed with either newrbe or newrb. GRNN and 
PNN can be designed with newgrnn and newpnn, respectively. 


Type help radbasisto seealisting ofall fanctions and demonstrations related 
to radial basis networks. 
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Radial Basis Funcftions 


Neuron Model 
Here is aradial basis network with 尺 inputs. 


Input Radial Basis Neuron 





4a=71adpas(|w-p||D) 


Notice thatthe expression forthenetinput ofa radbas neuron is different 位 om 
that ofneurons in previous chapters. Here the net inputtothe radbas transfer 
fanction is the vector distance between its weight vector w andthe input vector 
p, multiplied by the bias b. (The | qist | box in this fgure accepts the input 
vector p and the single row input weight matrix, and produces the dot product 
of the two.) 


The trangsfer fanction for aradial basis neuron ig: 


racpbas() = e 


Here is a plot ofthe radbas transfer fanction， 






0 ， 
-0.833 +0.833 
QG = 7adpas(71) 


Radial Basis Function 
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The radial basis function has a maximum of 1 when its input is 0. As the 
distance between w and p decreases, the output increases. Thus, aradial basigs 
neuron acts as a detector that produces 1 whenever the input p is identical to 
its weight vector p， 


The bias bg allows the sensitivity ofthe radbas neuron to be adjusted. For 
example, ifa neuron had a bias of 0.1 it would output 0.5 for any input vector 
Pp at vector distance of 8.326 (0.8326/) from its weight vector w. 


Neftwork Architecture 


Radial basis networks consist oftwo layers: a hidden radial basis layer of S1 
neurons, and an output linear layer of S2 neurons. 


Input Radial Basis Layer Linear Layer 
一 个 Where.. 
和 ， 尺 =number of 
elements in 


input vector 






SL=number of 





S2XS1 


neurons in 
layer 1 
S2X1 S2 =number of 
NI 7/ARA 和 YU / neurons in 
攻 了 ) layer 2 
a1= radbas (| IWip|27) 3a2 = DUFeli(LW21 al+b2?) 


ay1is ith element of ai where IWiuais avector made of the ith row of ITWia 


The | qist | box in this fgure accepts the input vector p and the input weight 
matrix TWL1L and produces a vector having Si elements. The elements are the 
distances between the input vector and vectors iTWL1formed 他 om the rows of 
the input weight matrix. 


The bias vector bl and the output of | dist | are combined with the MATLAB 
operation .# ,which does element-by-element multiplication . 


The output ofthefirstlayer for afeed forward network net can be obtained with 
the following code: 


a{f1} = radbas(netprod(dist(net.IW{1,1}，,p),net.b{1+)) 
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Fortunately, you wont have to write such lines of code. All of the details of 
designing this network are built into design functions newrbe and newrb, and 
their outputs can be obtained with sim. 


We can understand how this network behaves by following an input vector p 

through the network to the output a2. If we present an input vector to such a 
network, each neuron in the radial basis layer will output a value according to 
how close the input vector is to each neuron”s weight vector. 


Thus, radqial basis neurons with weight vectors quijte different from the input 
vector p have outputs near zero. These small outputs have only a negligible 
effect on the linear output neurons. 


In contrast, aradial basis neuron with a weight vector close to the input vector 
p produces a value near 1. Ifaneuron has an output of l its output weights in 
the second layer pass their values to the linear neurons in the second layer. 


In fact, 这 only one radial basis neuron had an output of 1, and all others had 
outputs of 0's (or very close to 0), the output of the linear layer would be the 
active neuron's output weights. This would, however, be an extreme case. 
Typically several neurons are always firing, to varying degrees. 


Now let us look in detail at how the first layer operates. Each neuron's 
weighted input is the distance between the input vector and its weight vector， 
calculated with dist. 了 ach neuron's net input is the element-by-element 
product of its weighted input with its bias, calculated with netprod. Each 
neurons' output is its net input passed through radbas. Ifa neuron's weight 
vector is equal to the input vector (transposedj, its weighted input is 0, its net 
input is 0, and its output is 1. Ia neuron's weight vector is a distance ofspread 
位 om the input vector, its weighted input is spread, its net input is sqrt(-log(.5)) 
(or 0.8326), therefore its output is 0.5. 


Exact Design (newrbe) 


Radial basis networks can be designed with the fanction newrbe. This function 
can produce a network with zero erTor on training vectors. It is called in the 
following way. 


net = newrbe(P,T,SPREAD ) 
The function newrbe takes matrices ofinput vectors P and target vectors T, and 


aspread constant SPREAD fortheradial basis ljayer, and returns amnetwork with 
weights and biases such that the outputs are exactly T when the inputs are P. 


7-5 


了 Roadial Basis Networks 





7-6 


This function newrbe creates as many radbas neurons as there are input 
vectors in P, and sets the first-layer weights to P'. Thus, wehavealayer of 
radbas neurons in which each neuron acts as a detector for a different input 
vector. Ifthere are Q input vectors, then there will be Q neurons. 


卫 ach bias in the first layer is set to 0.8326/SPREAD. This gives radial basigs 
fonctions that cross 0.5 at weighted inputs of +/- SPREAD. This determines the 
width of an area in the input space to which each neuron responds. If SPREAD 
is 4,then each radbas neuron will respond with 0.5 or more to any input vectors 
within a vector distance of 4 人 om their weight vector. As we shall see, SPREAD 
should be large enough that neurongs respond strongly to overlapping regions 
of the input space. 


The second-layer weights IVW 21(or in code, TIW{2,1) and biases b2 (or in code， 
bf2)) are found by simulating the first-layer outputs al (Allj), and then solving 
the following linear expression. 


[W{2,1} bf2}] * [A{1}; ones] = 工 


We know the inputs to the second layer (Allj) and the target (T), and the layer 
is linear. We can use the following code to calculate the weights and biases of 
the second layer to minimize the sum-squared erTror. 


Wb = T/[P; ones(1,Q)] 


Here Wb contains both weights and biases, with the biases in the last column. 
The sum-squared error will always be 0, as explained below. 


Wehaveaproblem with C constraints (input/target pairs) andeach neuron has 
C +1variables (the C weights 人 om the C radbas neurons, and abias).Alinear 
problem with C constraints and more than C variables has an infinite number 
of zero erTror solutiongs! 


Thus, newrbe creates a network with zero erTor on training vectors. The only 
condition we have to meet is to make sure that SPREAD is large enough so that 
the active input regions ofthe radbas neurons overlap enough so that several 
radbas neurongs always have fairly large outputs at any given moment. This 
makes the network fanction smoother and results in better generalization for 
new input vectors occurring between input vectors used in the design. 
(However, SPREAD should not be so large that each neuron is effectively 
responding in the same, large, area of the input space.) 


The qdrawback to newrbe is that it produces a network with as many hidden 
neurongs as there are input vectors. For this reason, newrbe does not return an 


Radial Basis Funcfions 





acceptable solution when many input vectors are needed to properly define a 
network, as is typically the case. 


More Efficient Design (newrb) 


The function newrb iteratively creates a radial basis network one neuron at a 
time. Neurongs are added to the network until the sum-squared error falls 
beneath an error goal or amaximum number ofneurons has been reached. The 
call for this function is: 


net = newrb(P,T,GOAL ,SPREAD ) 


The function newrb takes matrices of input and target vectors, P and T, and 
design parameters GOAL and, SPREAD, and returns the desired network. 


The design method of newrb is similar to that of newrbe. The difference is that 
newrb creates neurons one at a time. At each iteration the input vector that 
results in lowering the network erTor the most, is used to create a radbas 
neuron. The error of the new network is checked, and iflow enough newrb is 
finished. Otherwise the next neuron is added. This procedure is repeated until 
the erTror goal is met, or the maximum number ofneurons is reached. 


As with newrbe,itis imnportantthatthe spreadparameter be largeenough that 
the radbas neurons respond to overlapping regiongs ofthe input space, but not 
so large that all the neurons respond in essentially the same mannerT. 


Why not always use aradial basis network instead of a standard feed-forward 
network? Radial basis networks, even when designed efficiently with newrbe， 
tend to have many times more neurongs than a comparable feed-forward 
network with tansig or 1ogsig neurons in the hidden layer. 


This is because sigmoid neurons can have outputs over a large region of the 
input space, while radbas neurons only respond to relatively small regions of 
the input space. The result is that the larger the input space (in terms of 
number of inputs, and the ranges those inputs vary over) the more radbas 
neurons required. 


On the otherhand, designingaradial basis network often takes much less time 
than training a sigmoiq/linear network, and can Sometimes result in fewer 
neurons being used, as can be seen in the next demonstration . 
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Demonstfraftions 


The demonstration Script demorb1 shows how aradial basis network is used to 
fit a function. Here the problem is solved with only five neurons， 


Demonstration Scripts demorb3 and demorb4 examine hovw the spread constant 
affects the design process for radial basis networks. 


In demorb3, aradial basis network is designed to solve the same problem as in 
demorb1. However, this time the spread constant used is 0.01. Thus, each 
radial basis neuron returns 0.5 or lower, for any input vectors with a distance 
of 0.01 or more 他 om its weight vector. 


Because the training inputs occur at intervals of 0.1, no two radial basis 
neurons have a strong output for any given input. 


In demorb3, it was demonstrated that having too small a spread constant can 
result in a solution that does not generalize 位 om the input/target vectors used 
in the design. This demonstration, demorb4, shows the opposite problem. Ifthe 
Spread constant is large enough, the radial basis neurons will output large 
values (near 1.0) for all the inputs used to design the network. 


Ifall theradial basis neurons always output 1, any information presented to 

the network becomes lost. No matter whatthe input, the second layer outputs 
1s. The fanction newrb will attempt to find a network, but will not be able to 

do so because to numerical problems that arise in this Situation. 


The moral ofthe story is, choose a Spread constant larger than the distance 
between adjacent input vectors, So as to get good generalization, but smaller 
than the distance across the whole input space. 


For this problem that would mean picking a spread constant greater than 0.1， 
the interval between inputs, and less than 2, the distance between the 
left-most and right-most inputs. 


Generalized Regression Nelworks 








Generalized Regression Networks 


A generajlized regression neural network (GRNN) is often used for function 
approximation. As qiscussed below, it has a radial basis layer and a special 
linear layer. 


Nefwork Archifecture 
The architecture for the GRNN is shown below. Itis Similar to the radial basigs 
network, but has a slightly different second layer. 


Input Radial Basis Layer Special Linear Layer Where . 


及 =no. of elements 
in input vector 


OO =no. of neurons 





in layer 1 
= no. of neurons 
in layer 2 
民 O Xx1 O O 这 f t/ 
\ / 从 / @ =no.ofinpu 
\ target palrs 
a7 = radbasy (| IWi-p|27) 32 = DPIU1elLiz(D2) 


ay1is ith element of ai where IWhiis avector made ofthe ith row of TWia 


Here the nprod box shown above (code function normprod) produces S2 
elements in vector n2. Each element is the dot product of a row ofLW21and 
the input vector al, all normalized by the sum of the elements of al. For 
instance, Suppose that: 


LW{1,2}= [1 -2;3 4;5 6]; 
8 人 3 


Then 


aout = normprod(LW{1,2}，,a{f1}) 
aout = 

-23 

11 

13 
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The first layer is just like that for newrbe networks. It has as many neurons as 
there are input/ target vectors in 卫 . Specifically, the first layer weights are set 
to P'. The bias bl is set to a column vector of 0.8326/SPREAD. The user choosesg 
SPREAD, the distance an input vector must be from a neuron's weight vector to 
be 0.5. 


Again, the first layer operates just like the newbe radial basis layer described 
previously. 也 ach neuron's weighted input is the distance between the input 
vector and its weight vector, calculated with dist. 了 ach neuron's net input is 
the product of its weighted input with its bias, calculated with netprod. Each 
neurons' output is its net input passed through radbas. Ifa neuron's weight 
vector is equal to the input vector (transposedj, its weighted input will be 0, its 
net input will be 0, and its output will be 1. 开 a neuron's weight vector is a 
distance ofspread from the input vector, its weighted input will be spread, and 
its net input will be sqrt(-log(.5)) (or 0.8326). Therefore its output will be 0.5. 


The second layer also has as many neurons as input/target vectors, but here 
LW{2 ,1} is set to T. 


Suppose we have an input vector p close to pi one of the input vectors among 
the input vector/target pairs used in designing layer one weights. This input p 
produces alayer 1 aoutput close to 1. This leads to alayer 2 output close to tj 
one of the targets used forming layer 2 weights. 


Alarger spread leads to a large area around the input vector where layer 1 
neurons will respond with significant outputs.Therefore ispread is small the 
radial basis function is very steep so that the neuron with the weight vector 
closest to the input will have a much larger output than other neurons. The 
network will tend to respond with the target vector associated with the nearest 
design input vector， 


As spread gets larger the radial basis function's slope gets smoother and 
several neuron's may respond to an input vector. The network then acts like 让 
is takingaweighted average between target vectors whose design input vectors 
are closest to the new input vector. As spread gets larger more and more 
neurons contribute to the average with the result that the network function 
becomes smoother. 


Design (newgrnn) 
Youcan usethe fuanction newgrnn to create aGRNN.Forinstance, sSupposethat 
three input and three target vectors are defined as: 





Generalized Regression Nelworks 





P= [456]; 
T= [1.5 3.6 6.7]; 


We can now obtain a GRNN with 
net = newgrnn(P,T) ; 
and simulate it with 


P = 4.5; 
vV=Sim(net,P) 


You mijght want to try demogrn1. It shows how to approximate a function with 
a GRNN. 
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Probabilisfic Neural Nefworks 


Probabilistic neural networks can be used for classification problems. When an 
input is presented, the first layer computes distances 位 om the input vector to 
the training input vectors, and produces a vector whose elements indicate how 
closethe input is to atraining input.The second layer sums these contributions 
for each class of inputs to produce as its net output a vector of probabilities. 
Finally, aco7pete transfer function on the output ofthe second layer picks the 
maximum ofthese probabilities, and produces al forthat class and a 0forthe 
other classes. The architecture for this system is Shown below. 


Neftwork Architecture 


Input Radial Basis Layer Competitive Layer 


Where.. 


尺 =number of 
elements in 
input vector 





a7 = radbpas (|| TIWI-P | Zi7) 32 = co1IDetl( 工 W21a0) 


ayis ith element of ai where IWihiis a vector made of the /th row of IWia 


=number of inputtarget pairs =number of neurons in layer 1 
天 = number of classes of input data = number of neurons in layer 2 


LI is assumed that there are Q input vector/target vector pairs. 瑟 ach target 
vector has 开 elements. One of these element is 1 and the rest is 0. Thus, each 
input vector is associated with one of 及 classes. 


The first-layer input weights, TIWLI (net.IW{1,1}) are set to the transpose of 
the matrix formed from the Q training pairs, P. When an input is presented 
the | |dqist| | box produces a vector whose elements indicate how close the 
input is to the vectors ofthe training set. These elements are multiplied， 
element by element, by the bias and sent the radbas transfer function. An 
input vector close to a training vector is represented by a number close to 1 in 
the output vector al. If an input is close to several training vectors of a single 
class, it is represented by several elements of al that are close to 1. 


Probapbilistic Neural Networks 





The secondq-layer weights， LWL12 (net.LW{2,1}), are set to the matrixTof 
target vectors. 也 ach vector has a 1 only in the row associated with that 
particular class ofinput, and 0's elsewhere. (A function ind2vec is used to 
create the proper vectors.) The multiplication Tal sums the elements ofal due 
to each of the K input classes. Finally, the second-layer trangsfer function， 
compete, produces a 1 corresponding to the largest element of n2, and 0?s 
elsewhere. Thus, the network has classified the input vector into a Specific one 
of 必 classes because that class had the maximum probability of being correct. 


Design (newpnn) 
You can use the function newpnn to create a PNN. For instance, Suppose that 
seven input vectors and their corresponding targets are 


P= [00;j11;03;1 4;3 1;4 1;4 3] 


which yields 
P = 
0 1 0 1 3 4 4 
0 1 3 4 1 1 3 
TcC= [1122333]; 
which yields 
Tc = 
1 1 2 2 3 3 3 


Weneed atarget matrix with ls in the right place. We can get 让 with the 
fanction ind2vec. It gives a matrix with 0?S except at the correct Spots. So 
execute 


T = ind2vec(Tc) 
which gives 
下 


一 
记 
心 
0 
和 
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Now we can create anetwork and simulate it, using the input P to make Sure 
that it does produce the correct classifications. We use the function vec2ind to 
convert the output Y into arow Yec to make the classifications clear. 


net = newpnn(P,T) ; 
Y=Sim(net,P) 
YC = vec2ind(Y) 


Finally we get 


1 1 2 2 3 3 3 


We might try classjfying vectors other than those that were used to design the 
Detwork. We will try to classjfy the vectors shown below in P2. 


P2 = [1 4;0 1;5 2]， 


P2 
1 0 5 
4 1 2 


Can you guess how these vectors will be classified? If we run the Simulation 
and plot the vectors as we did before, we get 


YC = 
2 1 3 


These results look good, for these tegst vectors were quite close to members of 
classes 2, 1 and 3 respectively. The network has managed to generalize its 
operation to properly classify vectors other than those used to design the 
Detwork. 


You might want to try demopnn1. It shows how to design a PNN, and hovw the 
network can successfully classify a vector not used in the design. 


Summary 





Summary 


Radial basis networks can be designed very quickly in two different ways. 


The first design method, newrbe, finds an exact solution. The function newrbe 
creates radial basis networks with as many radial basis neurongs as there are 
input vectors in the training data. 


The second method, newrb, fnds the smallest network that can solve the 
problem within a given erTror goal. Typically, far fewer neurons are required by 
newrb than are returned newrbe. However, because the number ofradial basigs 
neurons is proportional to the size ofthe input space, and the complexity ofthe 
problem, radial basis networks can still be larger than backpropagation 
Detworks. 


A generajlized regression neural network (GRNN) is often used for function 
approximation. It has been shown that, given a sufficient number of hidden 
neurons, GRNNs can approximate a _ continuous function to an arbitrary 
accuracy. 


Probabilistic neural networks (PNN) can be used for classification problems. 
Their design is straightforward and does not depend on training.A PNN is 
guaranteed to converge to a Bayesian classifier providing it is given enough 
training data. These networks generalize well. 


The GRNN and PNN have many advantages, but they both suffer 位 om one 
major disadvantage. They are slower to operate because they use more 
computation than other kinds ofnetworks to do their function approximation 
or classification. 
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Figures 


Radial Basis Neuron 


Input Radial Basis Neuron 





a=7adpas(||w-p||Db) 


Radbas Transfer Funcfion 






0 1 
-0.833 | +0.833 
QG=71adpas(J) 


Radial Basis Function 


Summary 





Radial Basis Network Architecture 


Input Radial Basis Layer Linear Layer 


Where.. 


RR=number of 
elements in 
input vector 


SL=number of 





neurons in 
layer 1 
尺 SIx1 91 S2x1 92 S2 =number of 
\ 人 和 OO 人 和 neurons in 
4a1 = radbas (| IWi-p| 2D7) a2 = Pu1elia(LW21al+b?) layer 


ay1is ith element of al where TIWihiis a vector made ofthe ith row of TWi 


Generalized Regression Neura| Network Architecfure 


Input Radial Basis Layer Special Linear Layer Where . 


及 =no. of elements 
in input vector 


OO =no. of neurons 





in layer 1 
= no. of neurons 
in layer 2 
民 O Xx1 O O 忆 fj t/ 
\ / 稚 / @ =no.ofinpu 
\ 一 target palrs 


a7 = adbasy (|IWi-p|27) 32 = DPIU1eliz(nD2) 
ay1is th element of al where TIWiiis a vector made ofthe ith row of TVWD 
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Probabilistic Neural Network Architecture 


Input Radial Basis Layer Competitive Layer 


Where.. 


尺 =number of 
elements in 
input vector 





07 = 71CCQDas ( || JWD -了 || Di17) 3a2 一 CO11IDet( LW21a0) 


ayis ith element of al where IWnhiis a vector made ofthe ith row of ITWi 


=number of inputtarget pairs =number of neurons in layer 1 
开 = number of classes of input data = number of neurons in layer 2 


New Functiions 
This chapter introduces the following nevw functions. 








Function Description 

Compet Competitive transfer function. 

dist Euclidean dqistance weight faunction 

dotprod Dot product weight function. 

ind2vec Convert indices to vectors. 

negdist Negative euclidean distance weight function 
netprod Product net input function. 

newgrnn Design a generalized regression neural network. 
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Function Description 

newpnn Design a probabilistic neural network. 
newrb Design a radial basis network. 

newrbe Design an exact radial basis network. 
normprod Normalized dot product weight function. 
radbas Radial basis transfer function. 

vec2ind Convert vectors to indices. 





7-19 


大 Roadial Basis Networks 





7-20 


Self-Organlzing and 


Learn. Vector Quant. Nets 





Introduction . 
Important Self- 人 ranizing an LVQ ne 


Competitive Learning 

Architecture . 

Creating a 人 六 eiira Na 和 CO 
Kohonen Learning Rule (learnk) 

Bias Learning Rule (Learncon) 

Training 

Graphical nle 


Self-Organizing Maps . .， . 

Topologies (gridtop, hextop， 订 光 

Distance Funct. (qist, jnkdist, mandist， 二 ， 
Architecture . 

Creating a Self (0 MAP Neiral Naiw5 让 人 二 66 
Training (learnsom) 人 和 
卫 xamples . 


Learning Vector Quantization Networks . 
Architecture . 5 

Creating an LVQ NaRwe 证 全 5 

LVQ1 Learning Rule(learnlvl) 

Training .， ， ， 
Supplemental LVQ2. 1 TS 记 ER 克 ， 


Summary and Conclusions 
Self-Organizing Maps . 

Learning Vector Quantizaton SRwaake 
Figures . 

New Functiongs 





8 SelfOrgonizing and Learn. Yector Quanl. Nets 





8-2 


Inftroduction 


Self-organizing in networks is one ofthe most fascinating topics in the neural 
network field. Such networks can learn to detect regularities and correlationsg 
in their input and adapt their future responses to that input accordingly. The 
neurons of competitive networks learn to recognize groups of similar input 
vectors. Self-organizing maps learn to recognize groups ofsimilar input vectors 
in such a way that neurons physically near each other in the neuron layer 
respond to similar input vectors. A _ basic reference ig 


Kohonen, T. SeL 广 Orsamzzatiom ad Associative Memaory, 21Q 五 dition, Berlin: 
Springer-Verlag, 1987. 


Learning vector quantization (LVQ) is amethod fortraining competitive layers 
in a Supervised manner. A competitive layer automatically learns to classify 
input vectors. However, the classes that the competitive layer fnds are 
dependent only on the distance between input vectors. Iftwo input vectors are 
very Simjlar, the competitive layer probably will put them in the same class. 
There is no mechanism in a strictly competitive layer design to say whether or 
not any two input vectors are in the same class or different classes. 


LVQ networks, on the other hand, learn to classify input vectors into target 
classes chosen by the user、. 


You might consult the following reference: 


Kohonen, T. SeL 广 Orsamzizatiom ad Associative Meaory, 21Q 五 dition, Berlin: 
Springer-Verlag, 1987. 


Important Self-Organizing and LVQ Functions 


Competitive layers and self organizing maps can be created with newc and 
newsom, respectively. Alisting of all self-organizing functions and 
demonstrations can be found by typing help selforg. 


An LVQ network can be created with the function newlvq. For alist ofall LVQ 
fonctions and demonstrations type help 1Lvdq. 


Compeiitive Learning 





Compeftitive Learning 


The neurons in a competitive layer distribute themselves to recognize 
位 equently presented input vectors. 


Architfecture 
The architecture for a competitive network is shown below. 


Input Competitive Layer 





The | qist | box in this fgure accepts the input vector p and the input weight 
matrix ITWL1 and produces a vector having SLelements. The elements are the 
negative ofthe distances between the input vector and vectors iTWL1Lformed 
位 om the rows of the input weight matrix， 


The net input nl of a competitive layer is computed by finding the negative 
distance between input vector p and the weight vectors and adding the biases 
b. 开 all biases are zero, the maximum net input a neuron can have is 0. This 
occurs when the input vector p equals that neuron's weight vector. 


The competitive transfer function accepts a net input vector for a layer and 
returns neuron outputs of 0 for all neurons except for the we the neuron 
associated with the most positive element ofnet input nl. The winners output 
is 1. Iall biases are 0, then the neuron whose weight vector is closest to the 
input vector has the /east negative net input and, therefore, wins the 
competition to output a 1. 


Reasons for using biases with competitive layers are introduced in a later 
Section on taining. 
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creating a Compefifive Neural Network (newc) 
Acompetitive neural network can be created with the function newc. We show 
hovw this works with a simple example. 


Suppose we want to divide the following four two-element vectors into two 
classesg. 


p=[.1.8 .1.9).2.9 .1 .8] 

P = 
0.1000 0.8000 0.1000 0.9000 
0.2000 0.9000 0.1000 0.8000 


Thus, we have two vectors near the origin and two vectors near (1,1). 


First, create a two-neuron layer with two input elements ranging 位 om 0 to 工 . 
The first argument gives the range of the two input vectors and the second 
argument says that there are to be two neurons. 


net = newc([0 1; 0 1],2) ;| 


The weights are initialized to the center ofthe input ranges with the fanction 
midpoint. We can check to see these initial values as follows: 


WwWtSs = net.IW{1,1} 
WwWtS = 
0.5000 0.5000 
0.5000 0.5000 


These weights are indeed the values at the midpoint oftherange (0to 1) ofthe 
inputs, as we would expect when using midpoint for initialization, 


The biases are computed by initcon, which gives 


biases = 
5.4366 
5.4366 


Now we have anetwork, but we need to train it to do the classification job. 


Recall that each neuron competes to respond to an input vector p. Ifthe biases 
are all 0, the neuron whose weight vector is closest to p gets the highest net 
input and, therefore, wins the competition and outputs 1. All other neurons 
output 0. We would like to adjust the winning neuron so as to move it closer to 
the input. A learning rule to do this is discussed in the next section , 


Compeiifive Learning 





Kohonen Learning Rule (learnk) 


The weights of the winning neuron (a row of the input weight matrix) are 
adjusted with the Kojonen 1earning rule. Supposing that the ith neuron wins， 
the elements of the ith row ofthe input weight matrix are adjusted as shown 
below. 


JIWLTTro) = jwWri_-D+oapo-iwLi -TD) 


The 人 Kohonen rule allows the weights of a neuron to learn an input vector, and 
because of this it is usefu]l in recognition applications， 


Thus, the neuron whose weight vector was closest to the input vector is 
Updated to be even closer. The result is that the winning neuron is more 1ikely 
to win the competition the next time a Similar vector is presented, and lesgs 
likely to win when avery different input vector is presented. As more and more 
inputs are presented, each neuron in the layer closest to a group of input 
Vectors Soon adjusts its weight vector toward those input vectors. 也 ventually， 
过 there are enough neurons, every cluster of similar input vectors will have a 
neuron that outputs 1 when a vector in the cluster is presented, while 
outputting a 0 at all other times. Thus, the competitive network learns to 
categorize the input vectors it sees. 


The function learnk is used to perform the 人 Kohonen learning rule in this 
toolbox， 


Bias Learning Rule (learncon) 


One of the limitations of competitive networks is that some neurons may not 
always get allocated. In other words, Some neuron weight vectors may start out 
far from any input vectors and never win the competition, no matter hovw long 
the training is continued. The result is that their weights do not get to learn 
and they never win. These unfortunate neurons, referred to as aeaa 7exro7zs， 
Dever perform a useful fanction, 


To stop this 人 om happening, biases are used to give neurons that only win the 
competition rarely (if ever) an advantage over neurons that win often. A 
positive bias, added to the negative distance, makes a distant neuron more 
likely to win. 


To do this job a running average ofneuron outputs is kept. It is equivalent to 
the percentages oftimes each output is 1. This average is used to update the 
biases with the learning fanction learncon so that the biases of 他 equently 
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active neurons will get smaller, and biases of infrequently active neurons will 
get larger. 


Thelearningrates for learncon aretypically set an order ofmagnitude or more 
smaller than for learnk. Doing this helps make sure that the running average 
18 accurate. 


Theresult is that biases ofneurons thathaventresponded very 位 equently will 
increase versus biases ofneurons that have responded frequently. As the 
biases of infrequently active neurons increase, the input space to which that 
neuron responds increases. As that input space increases, the infrequently 
active neuron responds and moves toward more input vectors. 了 ventually the 
neuron will respond to an equal number of vectors as other neurons. 


This has two good effects. First, fa neuron never wings a competition because 
its weights are far from any ofthe input vectors, its bias will eventually get 
large enough so that it will be able to win. When this happens, it will move 
toward some group of input vectors. Once the neuron's weights have moved 
into a group of input vectors and the neuron is winning consistently, its bias 
will decrease to 0. Thus, the problem of dead neurons is resolved. 


The second advantage of biases is that they force each neuron to classify 
roughly the same percentage of input vectors. Thus, fa region ofthe input 
space is associated with a larger number ofinput vectors than another region， 
the more dengsely filled region will attract more neurons and be classified into 
smaller Subsections. 


Training 
Novw train the network for 500 epochs. Either train or adapt can be used. 


net.trainParam.epochs = 500 
net = train(net,p) ; 


Note that train for competitive networks uses the training function trainr. 
You can verify this by executing the following code after creating the networK. 


net.trainFcn 


This code produces 


ans = 
trainr 


Compeiifive Learning 





Thus, during each epoch, a single vector is chosen randomly and presented to 
the network and weight and bias values are updated accordingly. 


Next, sSupply the original vectors as input to the network, simulate the 
network, and finally convert its output vectors to class indices. 


a= Simn(net,p) 
ac = VvVec2ind(al) 


This yieldqs 


acC = 
1 2 1 2 


Weseethatthe network is trained to classify the input vectors into two groups， 
those near the origin, class 1, and those near (1,1), class 2. 


It might be interesting to look at the final weights and biases. They are 


WtS = 
0.8208 0.8263 
0.1348 0.1787 
biases = 
5.3699 
5.5049 


(You may get different answers 让 you run this problem, as a random seed is 
used to pick the order ofthe vectors presented to the network for training.) 
Note that the first vector (formed from the first row of the weight matrix) is 
near the input vectors close to (1,1), while the vector formed from the second 
row of the weight matrix is close to the input vectors near the origin. Thus, the 
network has been trained, just by exposing it to the inputs, to classify them. 


During training each neuron in the layer closest to a group of input vectors 
adjusts its weight vector toward those input vectors. 了 ventually, 让 there are 
enough neurons, every cluster of similar input Vectors has a neuron that 
outputs 1 when a vector in the cluster is presented, while outputting a 0 at all 
other times. Thus, the competitive network learns to categorize the input. 


Graphical Example 


Competitive layers can be understood better when their weight vectors and 
input vectors are Shown graphically. The diagram below shows 48 two-element 
input vectors represented as with “+ markers， 
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Input Vectors 





The input vectors above appear to fall into clusters. You can use a competitive 
Detwork of eight neurons to classijfy the vectors into such clusters, 


Try democ1 to see a dynamic example of competitive learning. 
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Self-Organizing Maps 


Self-organizing feature maps (SOFM) learn to classjfy input vectors according 
to hovw they are grouped in the input space. They differ from competitive layers 
in that neighboring neurongs in the self-organizing map learn to recognize 
neighboring sections ofthe input space. Thus, self-organizing maps learn both 
the distribution (as do competitive layers) and topology of the input vectors 
they are trained on. 


The neurons in the layer of an SOFM are arranged originally in physical 
positions according to a topology function. The functions gridtop, hextop or 
randtop can arrange the neurons in a grid, hexagonal, or random topology、. 
Distances between neuronsgs are calculated fom their positions with a distance 
function. There are four distance functions, dist, boxdist, Linkdist and 
mandist. Link distance is the most common. These topology and distance 
fanctions are described in detail later in this section. 


Here a self-organizing feature map network identifies a winning neuron 碎 
using the same procedure as employed by a competitive layer. However， 
instead of updating only the winning neuron, all neurons within a certain 
neighborhood Vis(d)of the winning neuron are updated using the Kohonen 
rule. Specifically, we adjust all such neurons ;es Nis(Q) as follows. 


iw(9) = iw(qg-1)+QaDp()-;iw(9-1TD)) or 


iw(9) = (IT-o)iw(g9-J)+op(9) 


Here the me15ADorAooa N ix(Q) contains the indices for all ofthe neurons that 
lie within a radius C ofthe winning neuron ze . 


Ni(d) = 仿 dij<d} 


Thus, when a vector p is presented, the weights ofthe winning neuron awPa its 
close neighbors move toward p . Consequently, after many presentations， 
neighboring neurons will have learned vectors similar to each other. 


Toillustrate the concept ofneighborhoods, consider thefigure given below.The 
left diagram shows a two-dimensional neighborhood ofradius w = 1 around 
neuron 13 . The right diagram shows a neighborhood ofradius C = 2. 
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These neighborhoods could be written as: 


Ni3(1) = {8, 12, 13, 14, 18} and 
Ni13(2) = {3, 7, 8, 9, 11, 12, 13, 14, 15, 17, 18, 19, 23} 


Note that the neurons in an SOFM do not have to be arranged in a 
two-dimensional pattern. You can use a one-dimensional arrangement, or even 
three or more dimensions. For aone-dimensional SOFM, aneuron has only two 
neighbors within aradius of 1 (or a single neighbor ifthe neuron is at the end 
ofthe line).You can also define distance in dqifferent ways, for instance,byusing 
rectangular and hexagonal arrangements of neurongs and neighborhoods. The 
performance of the network is not sengsitive to the exact shape ofthe 
neighborhoods. 


Topologies (gridtop, hextop, randtop) 


You can specify different topologies for the original neuron locations with the 
fanctions gridtop, hextop or Fandtop. 


The gridtop topology starts with neurons in arectangular grid similar to that 
shown in the previous figure. For example, suppose that you wanta2by3 
array of six neurons You can get this with: 


pos = gridtop(2;,3) 

pos = 
0 1 0 1 0 1 
0 0 1 1 2 2 


SelfOrganizing Maps 





Here neuron 1 has the position (0,0); neuron 2 has the position (1,0); neuron 3 
had the position (0,1); etc. 


-全 全 
国生 
命 合 


0 1 
gridtop(2,3) 





Note that had we asked for a gridtop with the arguments reversed we would 
have gotten a slightly different arrangement. 


pos = gridtop(3,2) 

pos = 
0 1 2 0 1 2 
0 0 0 1 1 1 


An 8-by-10 set ofneurons in a gridtop topology can be created and plotted with 
the code shown below 


pos = gridtop(8,10) ; 
plLotsom(pos ) 


to give the following graph. 
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As shown, the neurons in the gridtop topology do indeed lie on a grid. 


The hextop function creates a Similar set ofneurons, but they are in a 
hexagonal pattern. A 2-by-3 pattern of hextop neuronsgs is generated as followsgs: 


pos = hextop(2,3) 

pos = 
0 1.0000 0.5000 1.5000 0 1.0000 
0 0 0.8660 0.8660 1.7321 1.7321 


Note that hextop is the default pattern for SOFM networks generated with 
newsom. 


An 8-by-10 set ofneurons in a hextop topology can be created and plotted with 
the code shown below, 


pos = hextop(8,10) ; 
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plLotsom(pos ) 


to give the following graph. 


0 


Neuron Positions 
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position(2, 


























position(1,i) 
Note the positions ofthe neurons in a hexagonal arrangement. 


Finally, the randtop function creates neurons in an N dimensional random 
pattern. The following code generates a random pattern of neurons. 


pos = randtop(2;,3) 
pos = 
0 0.7787 0.4390 1.0657 0.1470 0.9070 
0 0.1925 0.6476 0.9106 1.6490 1.4027 
An 8-by-10 set ofneurons in arandtop topology can be created and plotted with 
the code shown below 


pos = randtop(8,10) ; 
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plLlotsom(pos ) 
to give the following graph. 


Neuron Positions 
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position(1,i) 


For examples, see the help for these topology functions. 


Distance Funct. (dist， linkdist, mandist, boxdisf) 

In this toolbox, there are four distinct ways to calculate distances 位 om a 
particular neuron to its neighbors. 也 ach calculation method is implemented 
with a special function. 


The dist function has been qiscussed before. It calculates the Euclidean 
distance 位 om a /oze neuron to any other neuron. Suppose we have three 
neurons: 


pos2 
pos2 


[012;012] 
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0 1 2 
0 1 2 


We find the distance 位 om each neuron to the other with 


D2 = dist(pos2) 
D2 = 
0 1.4142 2.8284 
1.4142 0 1.4142 
2.8284 1.4142 0 


Thus, the distance 位 om neuron 1l to itself is 0, the distance from neuron 1 to 
neuron 2 js 1.414, etc. These are indeed the Euclidean distances as we know 
thenm. 


The graph below shows ahome neuron in atwo-dimensional (gridtop) layer of 
neurons. The home neuron has neighborhoods of increasing diameter 
Surrounding it.Aneighborhood ofdiameter 1 includes the home neuron and its 
imnmediate neighbors. The neighborhood of diameter 2 includes the diameter 工 
neurons and their immediate neighbors. 


上 一 Coumns 一 | 
OOOoOoOoOoOoOoOoOOoOOO 


Home Neuron 

Neighborhood 1 
Neighborhood 2 
Neighborhood 3 


2-Dimensional 
Layer of Neurons 





OOOOOOOOGOO 
>OOOOOOOOOO 


As for the dist function, all the neighborhoods for an S neuron layer map are 
represented by an S-by-S matrix of distances. The particular distances shown 
above (1 in the immediate neighborhood, 2 in neighborhood 2, etc.), are 
generated by the function boxdist. Suppose that we have Six neurongs in a 
gridtop configuration. 
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pos = gridtop(2;,3) 
pos 


0 1 0 1 0 1 
0 0 1 1 2 2 


Then the box distances are 


d = boxdist(pos) 

d = 
0 1 1 1 2 2 
1 0 1 1 2 2 
1 1 0 1 1 1 
1 1 1 0 1 1 
2 2 1 1 0 1 
2 2 1 1 1 0 


The distance 位 om neuron lto2,3,and4isjustl,fortheyareinthe imnmediate 
neighborhood. The distance from neuron 1 to both 5 and 6 is 2. The distance 
位 om both 3 and 4 to all other neurons is just 工 . 


The LpzpR azistamce 位 om one neuron is just the number of links, or steps, that 
must be taken to get to the neuron under consideration. Thus, 这 we calculate 
the distances 人 om the same set ofneurons with linkdist we get 


d]link = 
0 1 1 2 2 3 
1 0 2 1 3 2 
1 2 0 1 1 2 
2 1 1 0 2 1 
2 3 1 2 0 1 
3 2 2 1 1 0 


The Manphattan distance between two vectors X and y is calculated as 
D= Sum(abs(x-y)) 


Thus ifwe have 


WwW1=[12; 34;56] 
W1 = 

1 2 

3 4 

5 6 
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and 


P1= [1;1] 
P1 = 


then we get for the distances 
2Z1 = mandist(W1,P1I) 
Z1 = 
1 


5 
9 


The distances calculated with mandist do indeed follow the mathematical 


expression given above. 


Architecfure 
The architecture for this SOFM is shown below. 


Input Self Organizing Map Layer 





nm=-INWwa-pll 
al = COmpetny) 


This architecture is like that of a competitive network, except no bias is used 
here. The competitive transfer function produces a 1 for output element al 
corresponding to ix ,the winning neuron. All other output elements in al are 0. 


Now, however, as described above, neurons close to the winning neuron are 
updated along with the winning neuron. As described previously, one can chose 
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位 om various topologies of neurons. Similarly, one can choose from various 
distance expressiong to calculate neurons that are close to the winning neuron. 


Creating a Self Organizing MAP Neural Netfwork 
(newsom) 

You can create a new SOFM network with the function newsom. This function 
defines variables used in two phases of learning: 

e。 Ordering-phase learning rate 

e Ordering-phase steps 

e。 Tuning-phase learning rate 

e。 Tuning-phase neighborhood distance 


These values are used for training and adapting. 
Congsider the following example. 


Suppose that we want to create a network having input vectors with two 
elements that fall in the range 0to 2 and 0 to 1lrespectively. Further suppose 
that we want to have Six neurons in a hexagonal 2-by-3 network. The code to 
obtain this network is 


net = newsom([0 2; 0 1] ，[2 31]) 


Suppose also that the vectors to train on are 


展会 二 十 友 汪 遇 宙 全 汪汪 于 全 是 
0.2 0.1 0.3 0.1 0.3 0.2 1.8 1.8 1.9 1.9 1.7 1.8] 


We can plot all ofthis with 


plot(P(1,:),P(2,:)，.g' ，markersize' ,20) 
hold on 
plLlotsom(net.iw{1,1},net.Jayers{1}.distances) 
hold off 


to give 
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Weight Vectors 
2 T T T T T 





W(2) 











WiT) 


The various training vectors are seen as fuzzy gray Spots around the perimeter 
of this 有 gure. The initialization for newsom is midpoint. Thus, the initial 
network neurons are all concentrated at the black spot at (1, 0.5). 


When simulating a network, the negative distances between each neuron's 
weight vector and the input vector are calculated (negdist)to getthe weighted 
inputs. The weighted inputs are also the net inputs (netsum). The net inputs 
compete (Compete) so that only the neuron with the most positive net input will 
output a 1. 


Training (learnsom) 


Learning in a selforganizing feature map occurs for one vector at a time， 
independent of whether the network is trained directly (trainr) or whether 进 
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is trained adaptively (trains). In either case, Ilearnsom is the selforganizing 
map weight learning function. 


First the network identifies the winning neuron. Then the weights ofthe 
winning neuron, and the other neurons in its neighborhood, are moved closer 
to the input vector at each learning step using the self-organizing map learning 
fonction learnsom. The winning neuron's weights are altered proportional to 
the learning rate. The weights of neurons in its neighborhood are altered 
proportional to halfthe learning rate. The learning rate and the neighborhood 
distance used to determine which neurons are in the winning neuron's 
neighborhood are altered during training through two phases. 


Phase 1: Ordering Phase 


This phase lasts for the given number of steps. The neighborhood distance 
starts as the maximum distance between two neurons, and decreases to the 
tuning neighborhood distance. The learning rate starts at the ordering-phase 
learning rate and decreases until it reaches the tuning-phase learningrate. As 
the neighborhood distance and learning rate decrease over this phase, the 
neurongs of the network typically order themselves in the input space with the 
same topology in which they are ordered physically. 


Phase 2: Tuning Phase 


This phase lasts for the rest oftraining or adaption. The neighborhood distance 
stays at the tuning neighborhood distance, (which should include only close 
Deighbors (i.e., typically 1.0). The learning rate continues to decrease 位 om the 
tuning phase learning rate, but very slowly. The small neighborhood and 
sl]owly decreasing learning rate fine tune the network, while keeping the 
ordering learned in the previous phase stable. The number ofepochs for the 
tuning part oftraining (ortime steps for adaption) should be much larger than 
the number of steps in the ordering phase, because the tuning phase usually 
takes much longer. 


Now let us take a look at some of the specific values commonly used in these 
Detworks. 


SefOrganizing Maps 





Learning occurs according to the learnsom learning parameter, shown here 
with its default value. 





LP.order_lr 0.9 Ordering-phase learning rate. 
LP.order_steps 1000 Ordering-phase steps， 
LP.tune_Jlr 0.02 Tuning-phase learning rate. 


LP.tune_nd 1 Tuning-phase neighborhood distance. 





Learnsom calculates the weight change dWwfor a given neuron 位 om the neuron's 
input P, activation A2, and learning rate LR: 


dw= 1JLrxa2x(p -WwW) 
where the activation A2 is found from the layer output A and neuron distances 
D and the current neighborhood size ND: 


a2(1),9q) = 1， if a(i,q) = 1 
0.5，if a(j,q) =1 and D(Ij) <= nd 
= 0， otherwise 


The learning rate LR and neighborhood size NS are altered through two phases: 
an ordering phase, and a tuning phase. 


The ordering phaselasts as many steps as LP.order_steps.Duringthis phase， 
LRis adqjusted from LP,.order_lrdowntoLP.tune_lr, and NDis adqjusted 位 om 
the maximum neuron distance down to 1. It is during this phase that neuron 
weights are expectedto order themselves in the input space consistent with the 
associated neuron pogsitionsS. 


During thetuningphaseLRdecreases slowly from LP.tune_lLrand NDis always 
set to LP.tune_nd. During this phase, the weights are expected to spread out 
relatively evenly over the input space while retaining their topological order 
found during the ordering phase. 


Thus, the neuron's weight vectors initially take large steps all together toward 
the area of input space where input vectors are occurTring. Then as the 
neighborhood size decreases to 1l, the map tends to order itself topologically 
over the presented input vectors. Once the neighborhood size is 1, the network 
should be fairly well ordered and the learning rate is slowly decreased over a 
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longer period to give the neurons time to Spread out evenly across the input 
Vectors， 


As with competitive layers, the neurons of a self-organizing map will order 
themselves with approximately equal distances between them 计 input vectors 
appear with even probability throughout a section of the input space. Also, 证 
input vectors occur with varying frequency throughout the input space, the 
feature map layer tends to allocate neurons to an area in proportion to the 

他 equency of input vectors there. 


Thus, feature maps, while learning to categorize their input, also learn both 
the topology and distribution of their input. 


We can train the network for 1000 epochs with 


net.trainParam.epochs = 1000 
net = train(net,P) ; 


This training produces the following plot. 


SelfOrgonizing Maps 























Weight Vectors 
Fr 参合 T 
1.8 上 @ @ @ 可 
和 @ 
1.6 上 本 
1.4 上 本 
寺 2 本 
5 
三 
0.8 上 上 
0.6 上 上 
0.4 上 
和 @ @@ 
0.2 上 和 @@ 和 @ 可 
| -二 | | -各 | | 
0 0.5 1 旺 5 2 
W(i1) 


We can see that the neurons have started to move toward the various training 
groups.Additional training is required to get the neurongs closer to the variousgs 
groups. 


As noted previously, self-organizing maps differ fom conventional competitive 
learning in terms of which neurons get their weights updated. Instead of 
updating only the winner, feature maps update the weights ofthe winner and 
its neighbors. The result is that neighboring neurons tend to have similar 
weight vectors and to be responsive to Similar input vectors. 


Examples 


Two examples are described briefly below. You might try the demonstration 
Scripts demosm1 and demosm2 to see Similar examples. 
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One-Dimensional Self-Organizing Map 


Consider 100 two-element unit input vectors Spread evenly between 0" and 90"?. 
angles = 0:0.5*xpi/199:0.5*xpiI; 
Here is a plot of the data. 


P= [sin(angles); cos(angles)]; 





0 0.5 1 
We define a a self-organizing map as a one-dimensional layer of 10 neurons. 


This map is to be trained on these input vectors shown above. Originally these 
neurons will be at the center of the figure. 
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1.5 


0.5 


W(i2) 


-0.5 
Wi1) 


Of course, since all the weight vectors start in the middqle ofthe inpput vector 
space, all you see now is a single circle. 


As training starts the weight vectors move together toward the input Vectors. 
They also become ordered as the neighborhood size decreases. Finally the layer 
adjusts its weights so that each neuron responds strongly to a region ofthe 
input space occupied by input vectors. The placement of neighboring neuron 
weight vectors also reflects the topology ofthe input vectors. 
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0.2 


0 0.5 1 
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Note that self-organizing maps are trained with input vectors in a random 
order, So Starting with the same initial vectors does not guarantee identical 
training results. 


Two-Dimensional Self-Organizing Map 


This example shows how a two-dimensional self-organizing map can be 
trained. 


First some random input data is created with the following code. 
P = rands(2,1000) ; 


Here is a plot of these 1000 input vectors， 


SefOrganizing Maps 








Atwo-dimensional map of 30 neurongs is used to classify these input Vectors. 
The two-dimensional map is five neurons by six neurons, with distances 
calculated according to the Manhattan distance neighborhood function 
mandist. 


The map is then trained for 5000 presentation cycles, with displays every 20 
Cycles. 


Here is what the self-organizing map looks like after 40 cycles. 
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0.5 


W(i.2) 


-0.5 


-0.5 0 0.5 1 
Wi1) 


The weight vectors, shown with circles, are almost randomly placed. However， 
even after only 40 presentation cycles, neighboring neurons, connected by 
lines, have weight vectors close together. 


Here is the map after 120 cycles. 


0.5 


Wi2) 
《7 


-0.5 


SelfOrganizing Maps 





After 120 cycles,the map has begun to organize itself according to the topology 
of the input space which constrains input vectors. 


The following plot, after 500 cycles, shows the map is more evenly distributed 
acrogsgs the input Space, 


0.5 


W(i2) 
站 


-0.5 


Wi) 


Finally, after 5000 cycles, the map is rather evenly spread across the input 
space. In addition, the neurons are Very evenly spaced reflecting the even 
distribution ofinput vectors in this problem. 
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0.5 


Wi2) 


-0.5 


W(i,1) 


Thus a two-dimensional self-organizing map has learned the topology of its 
inputs” space, 


It is important to note that while a self-organizing map does not take long to 
organjize itselfso thatneighboring neurongs recognize similar inputs,itcan take 
along time forthe map to finally arrange itself according to the distribution of 
input Vectors， 
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Learning Vector Quantizaftion Neftworks 


Input 





Archifecfure 
The LVQ network architecture is shown below, 


Competitive Layer Linear Layer Where.… 


RR=numberof 
elements in 
input vector 


Sl= number of 
competitive 
neurons 


S2= number of 


ni=-|IWi -pl a2 = purelin(LW2:al) linear neurons 


al = COmpet(n)) 


An LVQ network has a first competitive layer and a second linear layer. The 
competitive layer learns to classify input vectors in much the same way as the 
competitive layers of Chapter 8. The linear layer transforms the competitive 
layers classes into target classifications defined by the user. We refer to the 
classes learned by the competitive layer as spbclasses and the classes of the 
linear layer as iarset classes. 


Both the competitive and linear layers have one neuron per (Sub or target) 
class. Thus, the competitive layer can learn up to S1subclasses. These, in turn， 
are combined by the linear layer to form S2 target classes. (S1is always larger 
than 8S2.) 


For example, suppose neurons 1, 2, and 3 in the competitive layer all learn 
Subclasses ofthe input space that belongs to the linear layer target class No. 2. 
Then competitive neurons 1, 2, and 3, will have LW21 weights of 1.0 to neuron 
n2 in the linear layer, and weights of 0 to all other linear neurons. Thus, the 
linear neuron produces a 1 ifany ofthe three competitive neurons (1,2, and 3) 
win the competition and output a 1. This is how the subclasses of the 
competitive layer are combined into target classes in the linear layer. 
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In short,  a 1 in the ith row of al (the rest to the elements of al will be zero) 
effectively picks the ith column ofLW21 as the network output. Each such 
column contaings a single 1, corresponding to a specific class. Thus, subclass 18 
from layer 1 get put into various classes, bytheLW21lal multiplication in layer 
2. 


Weknow ahead oftime what fraction ofthe layer 1neurons should be classified 
into the various class outputs of layer 2, so we can specify the elements of 
LW21 at the start. However, wehave to go through atraining procedure to get 
the first layer to produce the correct subclass output for each vector of the 
training set. We discuss this training shortly. First consider how to create the 
original network. 


Creafting an LVQ Network (newlvd) 


An LVQ network can be created with the function newlvd 
net = newlvdq(PR,S1,PC,LR,LF) 
Where: 
e PR is an R-by-2 matrix of minimum and maximum values for Rinput 
elements. 
e。Slis the number of first layer hidden neurons. 
e。 PC is an S$2 element vector oftypical class percentages. 
e LRis the learning rate (default 0.01). 
e LE is the learning function (default is Ilearnlv1). 
Suppose we have 10 input vectors. We create a network that assigns each of 
these input vectors to one offour subclasses. Thus, we have four neurons in the 
first competitive layer. These subclasses are then assigned to one oftwo output 


classegs by the two neurons in layer 2. The input vectors and targets are 
specified by 


P=[-3-2-20000+2+21+3; 
0+rl-1+2t+l-1-2+l -1 0] 


and 


TC= [1112222111]， 


It may help to show the details of what we get 位 om these two lines of code. 


Learning Vector Quaniization Networks 





区 
-3 -2 -2 0 0 0 0 2 2 3 
0 1 -1 2 1 -1 -2 1 -1 0 
Tc = 
1 1 1 2 2 2 2 1 1 1 





Input Vectors 


As you can see, there are four subclasses of input vectors. We want a network 
that classifies pl, pz, pa, ps, p9, and plo to produce an output of 1, and that 
classifies vectors p4, p5, p6 and py to produce an output of 2. Note that thigs 
problem is nonlinearly separable, and so cannot be solved by a perceptron, but 
an LVQ network has no di 值 culty. 


Next we convert the Tc matrix to target vectors. 


T = ind2vec(Tc) 


This gives a sparse matrix T that can be displayed in full with 
targets = ful1(T) 


Which gives 


targets = 
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1 1 1 0 0 0 0 1 1 1 
0 0 0 1 1 1 1 0 0 0 


This looks right. It says, for instance, that 计 we have the first column of P as 
input, we should get the first column oftargets as an output; andthat output 
Says the input falls in class 1, which is correct. Now we are readyto call newlvd. 


We call newlvq with the proper arguments so that it creates a network with 
four neurons in the first layer and two neurons in the second layer. The 
first-]ayer weights are initialized to the center ofthe input ranges with the 
fonction midpoint.The second-layer weights have 60% (6 ofthe 10 in Tc above) 
of its columns with a 1 in the first row, (Corresponding to class 1), and 402% of 
its columns will have a 1 in the second row (corresponding to class 2). 


net = newlvdq(minmax(P), 4,[.6 .4]，0),1); 
We can check to see the initial values of the first-layer weight matrix. 


net.IW{1 ,1} 
ns = 


0 
口 口 吕 


0 


These zero weights are indeed the values atthe midpoint ofthe range (-3 to +3) 
of the inputs, as we would expect when using midpoint for initialization. 


We can look at the second-layer weights with 


net.LW{2 ,11} 

ans 三 
1 1 0 0 
0 0 1 1 


This makes sense too. It says that 让 the competitive layer produces a 1 as the 
first or second element. The input vector is classified as class 1; otherwise it is 
a class 2. 


Youmay notice thatthe first two competitive neurons are connected to the first 
linear neuron (with weights of1), while the second two competitive neurons are 
connected to the second linear neuron. All other weights between the 

competitive neurons and linear neurons have values of 0. Thus, each ofthe two 


Learning Vector Quaniization Networks 





target classes (the linear neurons) is, in fact, the union of two subclasses (the 
competitive neurons). 


We can simulate the network with sim. We use the original P matrix as input 
just to see what we get. 


Y=Sim(net,P); 
Y = vec2ind(Yb4t) 
时 过 
1 1 1 1 1 1 1 1 1 1 


The network classifies all inputs into class 1. Since tis not what we want, we 
have to train the network (adjusting the weights oflayer 1 only), before we can 
expect a good result. First we discuss two LVQ learning rules, and then we look 
at the training process. 


LVQ1 Learning Rule(learnlv1) 


LVQ learning in the competitive layer is based on a set of input/target pairs. 
{p1， t1 {p。， tp， 四 {pa， to} 


了 ach target vector has a single 1. The rest of its elements are 0. The 1l tells the 
proper classification of the associated input. For instance, consider the 
following training pair. 


Here we have input vectors of three elements, and each input vector is to be 
assigned to one offour classes. The network is to be trained so that it classifies 
the input vector shown above into the third of four classes. 


To train the network, an input vector p is presented, and the distance 位 om p 
to each row ofthe input weight matrix ITWL1Lis computed with the function 

ndist. The hidden neurons oflayer 1 compete. Suppose that the ith element of 
nlis most positive, and neuron 认 wins the competition. Then the competitive 
transfer function produces a 1 as the ixth element of al. All other elements of 


al are 0. 
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When al is multiplied by the layer 2 weights LW21 the single 1 in al selects 
the class, R+ associated with the input. Thus, the network has assigned the 
input vector p to class px and aw,，wWill be 1. Of course, this assignment may be 
a good one or abad one, for 刀 , may be 1 or 0, depending on whether the input 


到 米 
belonged to class R* or not. 


We adjust the ixth row ofIWL1Lin such a way as to move this row closer to the 
input vector p ifthe assignment is correct, and to move the row away from p 让 
the assignment is incorrect. So 让 p is classified correctly， 


2 
(aa = 如 *= 工 ) 


we compute the new value of the ixth row ofIWL1 as: 
1,1 下 隆 11 
ITW (9) = jiIW (Ca-I)+op()-;iIW” (9-1I1)， 
On the other hand, ifp is classified incorrectly， 
2 
(azx 一 工 基 如。 =0)， 
we compute the new value of the ixth row of ITWLI as: 
让 放 1,1 1,1 
paIW (0) = jxIW (9-J) -op()-IW (DTD) 


These corrections to the ixth row of ITWLlcan be made automatically without 
affecting other rows ofIWL1bybackpropagatingthe output errors backtolayer 
工 . 


Such correctiongs move the hiddqden neuron towards vectors that fall into the 
class for which it forms a subclass, and away 位 om vectors that fall into other 
classegS，. 


The learning function that imnplements these changes in the layer 1 weights in 
LVQ networks is learnlv1. It can be applied during training. 


Training 
Next we need to train the network to obtain first-layer weights that lead to the 


correct classification of input vectors. We do this with train as shown below, 
First set the training epochs to 150. Then, use train， 


net.trainParam.epochs = 150; 
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net = train(net,P,T) ; 


Now check on the first-layer weights. 


net.IW{1,11} 
ans = 
1.0927 0.0051 
-1.1028 -0.1288 
0 -0.5168 
0 0.3710 


The following plot shows that these weights have moved toward their 
respective classification groups. 





-5 0 5 
Weights (circles) after training 


To check to see that these weights do indeed lead to the correct classification， 
take the matrixP as input and simnulate the network. Then see what 
classifications are produced by the network. 


Y=Sim(net,P) 
YCc = vec2ind(Y) 


This gives 
YC = 
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1 1 1 2 2 2 2 1 1 1 


which is what we expected. As a last check, try an input close to a vector that 
was used in training. 

pchk1 = [0; 0.5]; 

Y=Sim(net,，pchk1) ; 

Yc1 = vec2ind(Y) 


This gives 
Yc1T = 
2 
This looks right, for pchk1 is close to other vectors classified as 2. Similarly， 


pchk2 = [1; 0]; 
Y=Sim(net,，pchk2); 
Yc2 = vec2ind(Y) 


givesg 
YCc2 = 
1 


This looks right too, for pchk2 is close to other vectors classified as 十. 


You might want to try the demonstration program demolvq1. It follows the 
discussion oftraining given above. 


Supplemental LVQ2.1 Learning Rule (learnlv2) 


The following learning rule is one that might be applied aHfxer first applying 
LVQ1. It may improve the result ofthe first learning. This particular version 
of LVQ2 (referred to as LVQ2.1 in the literature [Koho97]) is embodied in the 
fanction learnlv2. Note again that LVQ2.1is to be used only after LVQL has 
been applied 


Learning here is similar to that in learnlLv1 except now two vectors of layer 1 
that are closest to the input vector may be updated providing that one belongs 
to the correct class and one belongs to a wrong class and further providing that 
the input falls into a“window”near the midplane of the two vectors. 


The window is defined by 
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一 ,一 | >s where  S 
di 以 





min( 
7 
(where qi and wi are the Euclidean distances of p from _IWL 1 and FIW- 1 
respectively). We take a value for w in the range 0.2 to 0.3. I we picK, for 
instance, 0.25, then s = 0.6. This means that ifthe minimum ofthe two 
distance ratios is greater than 0.6, we adjust the two vectors. ij.e., 这 the input 
ig8“near the midplane， adjust the two vectors providing also fpRt the input 
vectorp and .IW“” belongtothesameclass, andp and DTIWW donotbelong 
in the same elass. 


The adjustments made are 


MIWr GO = TIWrr-D-aO-IW Ye-D) and 


1 工 1, 1 1, | 
PIW (9) = pnIW (9-D+cop(9)-AIW (9-I1D) ， 


Thus, given two vector closest to the input, as long as one belongs to the wrong 
class and the other to the correct class, and as long as the input falls in a 
midplane window, the two vectors will be adjusted. Such a procedure allows a 
vectorthat is just barely classified correctly with LVQl to be moved even closer 
to the input, so the results are more robust. 
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self-Organizing Maps 

A competitive network learns to categorize the input vectors presented to tt. 开 
aneural network only needs to learn to categorize its input vectors, then a 
competitive network will do. Competitive networks also learn the distribution 
of inputs by dedicating more neurons to classifying parts ofthe input space 
with higher densities of input， 


Aself-organizing map learns to categorize input vectors. It also learngs the 
distribution ofinput vectors. Feature maps allocate more neurons to recognize 
parts of the input space where many input vectors occur and allocate fewer 
neurons to parts ofthe input space where few input vectors occuT. 


Self-organizing maps also learn the topology of their input vectors. Neurons 
next to each other in the network learn to respond to similar vectors. The layer 
ofneurons can be imagined to be arubbernetthat is stretched over the regiongs 
in the input space where input vectors occuT， 


Self-organizing maps allow neurons that are neighbors to the winning neuron 
to output values. Thus the trangsition of output vectors is much smoother than 
that obtained with competitive layers, where only one neuron has an output at 
atame. 


Learning Vector Quanftizaton Networks 


LVQ networks classify input vectors into target classes by using a competitive 
layer to find subclasses of input vectors, and then combining them into the 
target classes. 


Unlike perceptrons, LVQ networks can classify any set of input vectors, not 
just linearly separable sets of input vectors. The only requirement is that the 
competitive layer must have enough neurons, and each class must be assigned 
enough competitive neurons， 


To ensure that each class is assigned an appropriate amount of competitive 
neurons, it is imnportant that the target vectors used to initialize the LVQ 
network have the same distributions of targets as the training data the 
network is trained on. Ifthis is done, target classes with more vectors will be 
the union of more Subclasses. 


Summary and Conclusions 





Figures 


Competifive Network Architecture 


Input Competitive Layer 


S1XR 






SIX1 





Self Organizing Feature Map Architecture 


Input Self Organizing Map Layer 





ni=-lIwWa-pll 
al = COmpetny) 


8-41 


8 SelfOrgonizing and Learn. Yector Quanl. Nets 





LVQ Architecture 


Input 





ms=-|IWw -pl 


al = coOmpet(n0) 


Competitive Layer 


Linear Layer Where.… 

及 =number of 
elements in 
input vector 


Sl= number of 
competitive 
neurons 


S2= number of 


aa = purelin(LW2ian) linear neurons 


New Funcfiions 
This chapter introduces the following nevw fanctions. 





Funcfion 


Description 





newcC 
1earnk 
newSsom 
1earncon 
boxdist 
djist 
inkdist 
mandist 
gridtop 
hextop 


randtop 


Create a competitive layer, 

Kohonen learning rule. 

Create a self-organizing map， 
Conscience bias learning function. 
Distance between two position Vectors. 
Euclidean distance weight fanction. 
Link distance function. 

Manhattan distance weight fanction. 
Gridtop layer topology fonction. 
Hexagonal layer topology function. 


Random layer topology function. 





8-42 


Summary and Conclusions 








Function Description 





new]lvd Create a learning vector quantization network. 
learnlv1 LVQ1 weight learning function. 


learnl1Vv2 LVQ2 weight learning function. 
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Inftroduction 


Recurrent networks is a topic of considerable interest. This chapter covers two 
recurrent networks: Elman, and Hopfield networks. 


了 上]man networks are two-layer backpropagation networks, with the addition of 
a feedback connection fom the output of the hidden layer to its input. This 
feedback path allows Elman networks to learn to recognize and generate 
temporal patterns, as well as spatial patterns. The best paper on the Elman 
Detwork is: 


了 上 Iman, J. 工 , “Finding structure in time,”Cos7ztive Sciemce, vol. 14, 1990, pp， 
179-211. 


The Hopfield network is used to store one or more stable target vectors. These 
stable vectors can be viewed as memories that the network recalls when 
provided with similar vectors that act as a cue to the network memory. You 
may want to pursue a basic paper in this field: 


Li J.,A. N. Michel, and W. Porod,“Analysis and Synthesis of a class ofneural 
networks: linear systems operating on aclosed hypercube,”1 五 形 玉 7T7a7sactzo72s 
07 CI7rcxzits a7Q Syste1s, Vol. 36, no. 11, November 1989, pp. 1405-1422. 


Important Recurrent Netfwork Funcfions 
Elman networks can be created with the function newe1lm. 


Hopfield networks can be created with the function newhop. 


Type help elman or help hopfield to see alist offunctions and 
demonstrations related to either of these networks. 


ElImoan Networks 





Elman Nefworks 


Archifecture 


The 了 lman network commonly is a two-layer network with feedback from the 
first-layer output to the first layer input. This recurrent connection allows the 
了 上 lman network to both detect and generate time-varying patterns. Atwo-layer 
了 上]man network is shown below, 








尺 1 SIX1 SIX1 


Input Recurrent tansig layer Output purelin layer 


al() = tansig(VWIIp +LW1IIalK-1) +byD a2() = purelin(LYWV21al() +b2) 


The Elman network has tansig neurons in its hidden (recurrent) layer, and 
purelLin neurons in its output layer. This combination is special in that 
two-layer networks with these trangsfer functions can approximate any 
fanction (with a finite number of discontinuities) with arbitrary accuracy. The 
only requirement is that the hidden layer must have enough neurons. More 
hidden neurons are needed as the function being fit increases in complexity. 


Note that the Elman network differs 位 om conventional two-layer networks im 
that the first layer has a recurrent connection. The delay in this connection 
stores Values 位 om the previous time step, which can be used in the current 
time step. 


Thus,even iftwoElman networks, with the same weights and biases, are given 
identical inputs at a given time step, their outputs can be different due to 
different feedback states. 
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Because the network can store information for future reference, it is able to 
learn temporal patterns as well as spatial patterns. The 了 lman network can be 
trained to respond to, and to generate, both kinds of patterns. 


Creafting an ElIman Nefwork (newelm) 


An Elman network with two or more layers can be created with the function 
newelm. The hidden layers commonlyhave tansig transfer functions, so that is 
the default for newelm. As shown in the architecture diagram, purelin is 
commonly the output-layer transfer function. 


The default backpropagation training function is trainbfg. One might use 
trainlm, but it tends to proceed so rapidly that it does not necessarily do well 
in the 卫 ]man network. The backprop weight/bias learning function default is 
learngdm, and the default performance function is mse . 


When the network is created, each layer's weights and biases are initialized 
with the Nguyen-Widrow layer initialization method imnplemented in the 
fonction initnw. 


Now consider an example. Suppose that we have a sequence of single-element 
input vectors in the range from 0 to 1. Suppose further that we want to have 
fvehidden-layer tansig neurongs and asingle logsig output layer. The following 
code creates the desired network. 


net = newelm([0 1],[5 1],{ tansig ，1ogsig }) 


Simulation 
Suppose that we wanttofind the response ofthis network to an input sequence 
of eight qigits that are either 0 or 二 . 
P = round(rand(1,8)) 
P = 
0 1 0 1 1 0 0 0 


Recall that a sequence to be presented to a network is to be in cell array form, 
We can convert P to this form with 


Pseq = con2seq(P) 
Pseq = 
[0] [1] [0] [1] [1] [0] [0] [0] 


Now we can find the output ofthe network with the function sim. 
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Y= Sim(net,Psed) 


、 生 
Columns 1 through 5 

[1.9875e-04] [0.1146] [5.0677e-05] [0.0017] [0.9544] 
Columns 6 through 8 

[0.00141] [5.7241e-05] [3.6413e-05] 


We convert this back to concurrent form with 
Z = Sedq2con(Y) ; 
and can finally display the output in concurrent form with 


z{1,1} 
ans = 
Columns 1 through 7 
0.0002 0.1146 0.0001 0.0017 0.9544 0.0014 0.0001 
Column 8 
0.0000 


Thus, once the network is created and the input specified, one need only call 
Sinm. 


Training an Elman Netfwork 
了 ]Iman networks can be trained with either oftwo fuanctions, train or adapt. 


When using the function traintotrain an 了 Elman network the following occurs， 


At each epoch: 


1 The entire input sequence is presented to the network, and its outputs are 
calculated and compared with the target sequence to generate an erTror 
sequence， 


2 For each time step, the error is backpropagated to find sraaziemts of errors 
for each weight and bias. This sraazent is actually an approximation since 
the contributions of weights and biases to errors Via the delayed recurrent 
connection are ignored. 


3 This gradient is then usedto update the weights with the backprop training 
fonction chosen by the user. The function traingdx is recommended. 
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When using the function adapt to train an 了 lman network, the following 
oOccurs. 


At each time step: 
1 Input vectors are presented to the network, and it generates an eITorT， 


2 The error is backpropagated to find gradients of errors for each weight and 
bias. This gradient is actually an approximation since the contributions of 
weights and biases to the error, via the delayed recurrent connection，are 
ignored. 


3 This approximate gradient is then used to update the weights with the 
learning function chosen by the user. The function learngdm is 
recommended. 


Elman networks are not as reliable as Some other kinds ofnetworks because 
both training and adaption happen using an approximation ofthe erTror 
gradient. 


For an 了 Iman to have the best chance at learning a problem it needs more 
hidden neurons in its hidden layer than are actually required for a solution by 
another method. While a solution may be available with fewer neurons, the 
了 Iman network is less able to find the most appropriate weights for hidden 
neurons Since the error gradient is approximated. Therefore, having a fair 
number ofneurons to begin with makes it more likelythatthe hidden neurons 
will start out dividing up the input space in useful ways, 


The function train trains an 了 lman network to generate a sequence of target 
vectors when it is presented with a given sequence of input vectors. The input 
vectors and target vectors are passed to train as matrices PandT.Traintakes 
these vectors and the initial weights and biases of the network, trains the 
network using backpropagation with momentum and an adaptive learning 
rate, and returns new weights and biases. 


Let us continue with the example ofthe previous section, and Suppose that we 
want to train a network with an input P and targets T as defined below 


P = round(rand(1,8)) 
P = 
1 0 1 1 1 0 1 1 
and 
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T= [0o(P(1:end-1)+P(2:end) == 2)] 
0 0 0 1 1 0 0 1 


Here T is defined to be 0, except when two ls occur in P, in which case T is 1. 
As noted previously, our network has five hidden neurons in the first layer. 


net = newelm([0 1],[5 1],{ tansig ，1ogsig }) ; 


We use trainbfg as the training function and train for 100 epochs. After 
training we simulate the network with the input P and calculate the difference 
between the target output and the simulated network output. 


net = train(net,Psedq,Tsed) ; 
Y=Sim(net,Pseq) ; 

Z = Sedq2con(Y) ; 

z{1,1}; 

diff1 =T- zf{f1,1} 


Note that the difference between the target and the Simulated output ofthe 
trained network is very small. Thus, the network is trained to produce the 
desired output sequence on presentation of the input vector 卫 . 


See Chapter 11 for an application ofthe 了 lman network to the detection of 
wave amplitudes. 
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Hopfield Network 


Fundamentals 


The goal here is to design a network that stores a Specific set of equilibrium 
points such that, when an initial condition is provided, the network eventually 
comes to rest at such a design point.The network is recursive in thatthe output 
is fed back as the input, once the network is in operation. Hopefully, the 
network output will settle on one of the original design points 


The design method that we present is not perfect in that the designed network 
may have undesired spurious equilibrium points in addition to the desired 
ones. However, the number of these undesired points is made as small as 
possible by the design method. Further, the domain of attraction of the 
designed equilibrium points is as large as possible. 


The design method is based on a system of first-order linear ordinary 
differential equations that are defined on a closed hypercube ofthe state space, 
The solutions exist on the boundary ofthe hypercube. These systems have the 
basic structure of the Hopfield model, but are eagsier to understand and design 
than the Hopfield model. 


The material in this section is based on the following paper: Jian-Hua Li， 
Anthony N. Michel and Wolfgang Porod, “Analysis and Synthesis of a class of 
neural networks: linear Systems operating on a closed hypercube,”I 卫 卫 疡 
Trans. on Circuits and Systems vol 36, no. 11, pp. 1405-22, November 1989. 


For further information on Hopfield networks, read Chapter 18 ofthe 五 opjie/Q 
ANetzorR [HDB96]. 


Architfecture 
The architecture ofthe network that we are Using follows. 


Hopfield Network 








P 
RIX1 
及 1 
\__/ 
Initial Symmetric saturated linear layer 
conditions al(0) =p andthenfork= 72. 


al() = Satlins(LWLIal-l)) +bD) 


As noted, the ipx p to this network merely supplies the initial conditions. 


The Hopfield network uses the saturated linear transfer function sat1Lins. 








Q = SGQ117S(7) 


Satlins Transfer Function 


For inputs less than -1 sat1ins producegs -1. For inputs in therange -1to +IL 让 
Simply returns the input value. For inputs greater than +1 it produces +1. 


This network can be tested with one or more input vectors which are presented 
as initial conditiongs to the network. After the initial condqitiongs are given, the 
network produces an output which is then fed back to become the input. This 
process is repeated over and over until the output stabilizes. Hopefully, each 
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output vector eventually converges to one of the design equilibrium point 
vectors that is closest to the input that provoked t. 


Design (newhop) 


Li et. al. [LiMi89] have studied a system that has the basic structure of the 
Hopfield network but is, in Lis own words，“easier to analyze, Synthesize, and 
implement than the Hopfield model.” The authors are enthusiastic about the 
reference article, as it has many excellent points and is one of the most 
readable in the field. However, the design is mathematically complex, and even 
a short justification ofit would burden this guide. Thus, we present the Li 
design method, with thanksto Lietal.,as arecipethatis found in the fanction 
newhop. 


Given a set of target equilibrium points represented as a matrix 工 of vectors， 
newhop returns weights and biases for a recursive network. The network is 
guaranteed to have stable equilibrium points at the target vectors, but it could 
contain other spurious equilibrium points as well. The number ofthese 
undesired points is made as small as possible by the design method. 


Once the network has been designed, it can be tested with one or more input 
vectors. Hopefully those input vectors close to target equilibrium points will 
fnd theirtargets. As suggestedbythenetwork figure, an array ofinput vectors 
may be presented at one time or in a batch. The network proceeds to give 
output vectors that are fed back as inputs. These output vectors can be can be 
compared to the target vectors to see how the solution is proceeding. 


The ability to run batches oftrial input vectors quickly allows you to check the 
design in a relatively short time. First you might check to see that the target 
equilibrium point vectors are indeed contained in the network. Then you could 
try other input vectors to determine the domains of attraction of the target 
equilibrium points and the locations of spurious equilibrium points 这 they are 
present. 


Consider the following design example. Suppose that we want to design a 
network with two stable points in a three-dimensional space. 


T=[-1-11; 1-11]， 
呈 一 

-1 1 

-1 -1 

1 1 
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We can execute the design with 


net = newhop(T) ; 


Next we can check to make sure that the designed network is at these two 
points. We can do this as follows. (Since Hopfield networks have no inputs, the 
second argument to Sim below is Q =2 when using matrix notation). 

Al = T; 

[Y,Pf,Af] = Sim(net,2,[],Ai); 

Y 
This gives Us 


Thus, the network has indeed been qdqesigned to be stable at its design points. 
Next we can try another input condition that is not a design point, Such as: 


Ai = {[-0.9; -0.8; 0.7]} 
This point is reasonably close to the first design point, so one might anticipate 
that the network would converge to that first point. To see 让 this happens, we 


run the following code. Note, incidentally, that we specified the original point 
in cell array form. This allows us to run the network for more than one step. 


[Y,Pf,Af] = Sim(net,{1 5}，,{}， AT) 
Y{11} 


We get 


Thus, an original condition close to a design point did converge to that point. 


This is, of course, our hope for all such inputs. Unfortunately, even the best 
known Hopfield designs occasionally include undesired spurious stable points 
that attract the solution. 
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Example 

Congsider a Hopfield network with just two neurons. Each neuron has a bias 
and weights to accommodate two-element input vectors weighted. We define 
the target equilibrium points to be stored in the network as the two columns of 
the matrix 工 . 


T=[1-1; -11]' 
T = 

1 -1 

-1 1 


Here is aplot ofthe Hopfield state space with thetwo stable points labeled with 
汗 markers. 


Hopfield Network State Space 


0.5 


al 人 1) 
These target stable points are given to newhop to obtain weights and biases of 
a Hopfield network. 


net = newhop(T) ; 


The design returns a set of weights and a bias for each neuron. The results are 
obtained 人 fom 


W= net.LW{1,1} 


Hopfield Network 





which gives 
W = 


0.6925 -0.4694 
-0.4694 0.6925 


and 位 om 


b = net.b{1,1} 


Which gives 
b 一 
1.0e-16 * 
0.6900 
0.6900 


Next the design is tested with the target vectors T to see 计 they are stored in 
the network. The targets are used as inputs for the simulation function sinm. 


Al = T; 
[Y,Pf,Af] = Sim(net,2,[],Ai); 
Y 三 

1 -1 

-1 1 


As hoped, the new network outputs are the target vectors. The solution staysg 
at its initial conditiongs after a single update and, therefore, will stay there for 


any number ofupdates. 
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Now you might wonder hovw the network performs with various random input 
vectors. Here is a plot showing the paths that the network took through its 
state Space, to arrive at a target point. 


Hopfield Network State Space 


0.5 


al1) 


This plot shovw the trajectories of the solution for various starting points. You 
can try the demonstration demohop1 to see more of this kind ofnetwork 
behavior. 


Hopfield networks can be designed for an arbitrary number of qimensions. You 
can try demohop3 to see a three-dimensional design. 


Unfortunately, Hopfield networks could have both unstable equilibrium points 
and spurious stable points. You can try demonstration programs demohop2 and 
demohop4 to investigate these issues. 


Summary 





Summary 


卫 ]man networks, byhaving an internal feedback loop, are capable of learning 
to detect and generate temporal patterns. This makes Elman networks useful 
in Such areasgs as Signal processing and prediction where time plays a dominant 
role. 


Because Elman networks are an extension of the two-layer sigmoid/linear 
architecture, they inherit the ability to fit any input/output function with a 
finite number ofdiscontinuities. They are also ableto fittemporal patterns, but 
may need many neurons in the recurrent layer to fit a complex function. 


了 opfield networks can act ags erTor correction or vector categorization 
networks. Input vectors are used as the initial conditions to the network, which 
recurrently updates until it reaches a stable output vector. 


Hopfield networks are interesting 位 om a theoretical standpoint, but are 
seldom used in practice. 了 ven the best 了 opfield designs may have Spurious 
stable points that lead to incorrect answers. More efcient and reliable error 
correction techniques, such as backpropagation, are available. 
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Figures 


EIman Network 







ai]) 


及 1 
Input Recurrent tansig layer Output purelin layer 


al = tansig(W1Ip +LWIIalK-1) + bl a2( 有 = purelin (LVW21al(6 +b2?) 


Hopfield Network 





RIX1 
尺 1 
\_ 
Initial Symmetric saturated linear layer 
conditions al(0) =p andthen for 上 = 7， 2 .… 


al( = Satlins(LW1LIalK-1)) +bl) 
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New Functions 
This chapter introduces the following new functions. 








Funcfion Descripfion 

newelm Create an Elman backpropagation networkK. 
newhop Create a Hopfield recurrent network. 

Sat1ins Symmetric saturating linear transfer function. 
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Inftroduction 


TheADALINE (Adaptive Linear Neuron networks) networks discussed in this 
chapter are similar to the perceptron, but their transfer fanction is linear 
rather than hard-limiting. This allows their outputs to take on any value， 
whereas the perceptron output is limited to either 0 or 1. Both the ADALINE 
and the perceptron can only solve linearly separable problems. However, here 
we will make use ofthe LMS (Least Mean Squares) learning rule, which is 
much more powerfu]l than the perceptron learning rule. The LMS or 
Widrow-Hofflearning rule minimizes the mean square error and, thus, moves 
the decision boundaries as far as it can 位 om the training patterns. 


In this chapter, we design an adaptive linear system that responds to changes 
in its environment as it is operating.Linearnetworks that are adjusted at each 
time step based on new input and target vectors can find weights and biasesgs 
that minimize the network”s sum-squared error for recent input and target 
vectors. Networks of this sort are often used in erTror cancellation, signal 
processing, and control systems. 


The pioneering work in this field was done by Widrow and Ho 人 ff who gave the 
name ADALINE to adaptive linear elements. The basic reference on this 
subject is: Widrow B. and $. D. Sterns, Aaaptive SiS7alL Processings, New York': 
Prentice-Hall 1985. 


We also consider the adaptive training of self organizing and competitive 
networks in this chapter. 


Important Adapfive Funcfions 


This chapter introduces the function adapt, which changes the weights and 
biases of a network incrementally during training. 


You can type help Linnet to seealist oflinear and adaptive network 
fonctions, demonstrations, and applications. 


Linear Neuron NModel 





Linear Neuron Model 


Alinear neuron with 尺 inputs is shown below, 


Linear Neuron with 
Input Vector Input 


Where.. 


及 =number of 
elements in 
input vector 





4=DUrelin(Wp+D) 


This network has the same basic structure as the perceptron. The only 
difference is that the linear neuron uses a linear trangsfer function, which we 
name purelin. 





QG = PUrelif1) 


Linear Transfer Function 


The linear transfer function calculates the neuron's output by simplyreturning 
the value passed to t. 


Q = Dreli() = DrelI(Wp+p) = Wp+pD 


This neuron can be trained to learn an affine function ofits inputs, or to find a 
linear approximation to a nonlinear function. A linear network cannot, of 
course, be made to perform a nonlinear computation 
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Adaptive Linear Network Architecture 


The ADALINE network shown belovw has one layer ofS neurons connected to 
妈 inputs through a matrix of weights W. 


Layer of Linear 
Input Neurons Input Layer of Linear Neurons 





a= purelin(Wp+b) 


Where.. 尺 = numberof 
elements in 
input vector 





S=numberof 
a= purelin(Wp+b) neurons in layer 


This network is sometimes called a MADALINE for Many ADALINES. Note 
that the fgure on the right defines an S-length output vector a. 


The Widrow-Hoff rule can only train single-layer linear networks. This is not 
much of a disadvantage, however, as single-layer linear networks are just as 
capable as multilayer linear networks. For every multilayer linear networKk， 
there is an equivalent single-layer linear networkK. 


Single ADALINE (newlin) 


Congsider a single ADALINE with two inputs. The diagram for this network is 
Shown below. 
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Input Simple ADALINE 





Q=D1Urelin(YVp+D) 


The weight matrix W in this case has only one row. The network output is: 
Q = Dreli7(7) = DrelI(Wp+pD) = Wp+pD OF 
qa = ll1D1+IW12D2+0 


Like the perceptron,the ADALINE has a aecisiom pov1zadary that is determined 
by the input vectors for which the net input 7 is zero. For . = 0 the equation 
Wp+b = 0 specifies such a decision boundary as shown below (adapted with 
thanks from [HDB96]) 


Q<0 











Input vectors in the upper right gray area lead to an output greater than 0. 
Input vectors in the lower left white area lead to an output less than 0. Thus， 
theADALINE can be used to classify objects into two categories. Now you can 
find the network output with the foanction sinm. 
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a= Sim(net,p) 
24 


To summarize, you can create an ADALINE network with newlin, adqjust its 
elements as youwant and simnulateitwith sim.Youcan fnd more about newlin 
by typing help newlin， 
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Mean Square Error 


Like the perceptron learning rule, the least mean square erTror (LMS) 
algorithm is an example of supervised training, in which the learning rule is 
provided with a set of examples of desired network behavior. 


{p1,t1》， {p2toz} ， {pao'tao} 


Here p_is an input to the network, and t，is the corresponding target output. 
As each input is applied to the network, the network output is compared to the 
target. The error is calculated as the difference between the target output and 
the network output. We want to minimize the average ofthe sum of these 
erTrors. 


Q Q 
卫 so = 了 CD-a0 
尺 = 工 丸 = 工 


711Se 三 


人 | 哺 


The LMS algorithm adjusts the weights and biases ofthe ADALINE so as to 
minimize this mean Square erTor. 


Fortunately, the mean square erTror performance index for the ADALINE 
network is a quadratic function. Thus, the performance index will either have 
one global minimum, a weak minimum, or no minimum, depending on the 
characteristics ofthe input vectors. Specifically,the characteristics ofthe input 
Vectors determine whether or not a unique solution exists. 


You can find more about this topic in Chapter 10 of [HDB96|]. 
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LMS Algorithm (learnwh) 


Adaptive networks will use the The LMS algorithm or Widrow-Hofflearning 
algorithm based on an approximate steepest descent procedure. Here again， 
adaptive linear networks are trained on examples of correct behavior. 


The LMS algorithm, shown below, is discussed in detail in“Linear Filters”in 
Chapter 4. 


WE+IT) = W(R)+2ae(E)p7OB) 
b(R+1) = b(R)+2ae( 有 ) ， 


Adapiive Fillering {adapH 





Adaptive Filtering (adapf) 


The ADALINE network, much like the perceptron, can only solve linearly 
separable problems. Nevertheless, the ADALINE has been and is today one of 
themost widely usedneural networks found in practical applications.Adaptive 
filtering is one of its major application areas， 


Tapped Delay Line 

We need a new component, the tapped delay line, to make full use of the 
ADALINE network. Such a delay line is shown below. There the input signal 
enters from the left, and passes through NW-1l delays. The output of the tapped 
delay line (TDL) is an NW-dimensional vector, made up ofthe input signal at the 
current time, the previous input signal， etc. 





Adaptive Filter 


We can combine atapped delay line with an ADALINE network to create the 
adaptive filter shown below. 
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Linear Layer 


P(O 





The output ofthe filter is given by 
玉 
a(R) = DreLI(Wp+D) = 有 Il ia( 有 -+1I) + 
z = 工 


The network shown above is referred to in the digital sijgnal processing field as 
a finite imnpulse response (了 IR) flter [WiSt85]. Let us take a look at the code 
that we use to generate and simulate such an adaptive network. 


Adapftive Filter Example 


First we will define a new linear network using newlin. 


Adapiive Fillering {adapH 





Input “Linear Digital Filter 


POO=PG) 






PO=PC-DL) 


PO=PG-2) 


NO 
4=DPDureli(yWp+D) 


Assume thatthe inputvalues have arange fom 0to 10. We can now define our 
single output network， 


net = newlin([0,10] ,1) 
We can specify the delays in the tapped delay line with 
net.inputWeights{1,1}.delays = [0 1 2]; 


This says thatthe delay line is connected to the network weight matrix through 
delays of 0, 1, and 2 time units. (You can specify as many delays as you want， 
and can omit some values 过 you like. They must be in ascending order.) 


We can give the various weights and the bias values with 


net.IW{1,1} = [7 8 9]; 
net.b{1}+ = [0]; 


Finally we will define the initial values ofthe outputs of the delays as 
pi ={1 2} 


Note that these are ordered 位 om left to right to correspond to the delays taken 
位 om top to bottom in the figure. This concludes the setup ofthe network. Now 
hovw about the input? 
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We assume that the input scalars arrive in a sequence, first the value 3, then 
the value 4, next the value 5, and finally the value 6. We can indicate this 
Sequence by defining the values as elements of a cell array. (Note the curly 
brackets.) 


p={3456} 


Now wehaveanetwork and a sequence ofinputs. We can simulate the network 
to see what its output is as a fanction of time. 


[a,pf] = Sim(net,p,pi) 
This yieldqs an output sequence 
aa 一 


[46] [70] [94] [118] 


and final values for the delay outputs of 


[5] [6] . 
The example is sufficiently simple that you can check it by hand to make Sure 


that you understand the inputs, initial values ofthe delays, etc. 


The network that we have defined can be trained with the function adapt to 
produce aparticular output sequence. Suppose, for instance, we would like the 
network to produce the sequence ofvalues 10, 20, 30, and 40. 


T=1{+02030 40} 


We can train our defined network to do this, starting from the initial delay 
conditions that we used above. We specify 10 passes through the input 
Sequence with 


net.adaptParam.passes = 10; 
Then we can do the training with 
[net,y,E pf,af] = adapt(net,p,T,pi); 
This code returns the final weights, bias, and output sequence shown below. 


wWts = net.IW{1,1} 
WtS = 
0.5059 3.1053 5.7046 


Adapiive Fillering (adapH 





bias = net.b{f1} 
bias = 
-1.5993 
y = 
[11.8558] [20.77351] [29.6679] [39.0036] 


Presumably, 这 we ran for additional passes the output sequence would have 
been even closer to the desired values of 10, 20, 30, and 40. 


Thus, adaptive networks can be specified, simulated, and finally trained with 
adapt. However, the outstanding value of adaptive networks lies in their use 
to perform a particular function, such as or prediction or noise cancellation. 


Predicfion Example 


Suppose that we wantto use an adaptive filter to predict the next value of a 
stationary random process, p( 共 . We use the network shown below to do this. 


Input “Linear Digital Filter 


PCD=P(D) Target = P(D 









PO=PC-DL) 





PCD=P(-2) Adjust weights 帮 一 


一 
4=DPpurelin(yWp+D) 


Predictive Filter: a(D is approximation toP(1) 


The signal to be predicted, p 作 , enters 他 om the left into a tapped delay line. 
The previous two values ofpI 二 are available as outputs fom the tapped delay 
line. The network uses adapt to change the weights on each time step so as to 
minimize the error e 斧 on the far right. 玉 this error is zero, then the network 
output ai8 exactly equal to p 划 , and the network has done its prediction 
properly. 
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Adetailed analysis ofthis network is not appropriate here, but we can state the 
main points. Given the autocorTrelation function of the stationary random 
process D( 介 , the error Surface, the maximum learning rate, and the optimum 
values ofthe weights can be calculated. Commonly, of course, one does not have 
detailed information about the random process, so these calculations cannot be 
performed. But this lack does not matter to the network. The network, once 
initialized and operating, adapts at each time step to minimize the error and 
in arelatively short time is able to predict the input p(t). 


Chapter 10 of IHDB96] presents this problem, goes through the analysis, and 
shows the weight trajectory during training. The network finds the optimum 
weights on its own without any difficulty whatsoeverT. 


You also can try demonstration program nnd10nc to see an adaptive noise 
cancellation program example in action. This demonstration allows you to pick 
alearning rate and 71o7ze7tx1, (See Chapter 5), and shows the learning 
trajectory, and the original and cancellation signals verses time， 


Noise Cancellation Example 


Congsider a pilot in an airplane. When the pilot speaks into a microphone, the 
engine noise in the cockpit is added to the voice signal, and the resultant signal 
heard by passengers would be of low quality. We would like to obtain a signal 
that contains the pilot's voice, but not the engine noise. We can do this with an 
adaptive filter 计 we obtain a sample of the engine noise and apply it as the 
input to the adaptive filter. 


Adapiive Fillering {adapH 





Pilot's Voice 
Pilot s Contaminated with Restored Signal 
Voice Engine Noise 







"Error" 
Filtered Noise to Cancel 


Contaminating Contamination 


Noise 


Noise Path 
Filter 





Adaptive 


Engine Noise 人 


Adaptive Filter Adjusts to Minimize Error. 
This removes the engine noise from contaminated 
Signal, leaving the pilot s voice as the “error.” 


Here we adaptively train the neural linear network to predict the combined 
pilot/engine signal 7 位 om an engine Signal 7. Notice that the engine signal 忆 
does not tell the adaptive network anything about the pilot's voice Signal 
contained in 71. However, the engine signal 7.. does give the network 
information it can use to predict the engine's contribution to the pilot/engine 
Signal 7]0. 


The network will do its best to adaptively output 7. In this case, the network 
can only predict the engine interference noise in the pilot/engine signal 2. The 
network error e is equal to 7, the pilotengine signal, minus the predicted 
contaminating engine noise signal. Thus, e contaings only the pilot's voice! Our 
linear adaptive network adaptively learngs to cancel the engine noise. 


Note, in closing, that such adaptive noise canceling generally does a better job 
than a classical filter because the noise here is subtracted 位 om rather than 
filtered out of the signal 思 . 
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Try demolin8 for an example of adaptive noise cancellation. 


Multiple Neuron Adapftive Filters 

We may want to use more than one neuron in an adaptive system, So we need 
some additional notation.A tapped delay line can be used with S linear 
neurons as Shown below. 


Linear Layer 





Alternatively, we can show this same network in abbreviated form. 
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Linear Layer of S Neurons 


Pp( 避 pd( 避 a( 月 


Ox1l | (ON xl n(A) 5 
SX(O*N) 
人 晤 癌 玉 


Ifwe want to show more ofthe detail ofthetapped delay line and there are not 
too many delays, we can use the following notation , 


Abreviated Notation 





2 Linear layer 


Here we have a tapped delay line that sends the current signal, the previous 
signal, and the signal delayed before that to the weight matrix. We could have 
a longer list, and some delay values could be omitted if desired. The only 
requirement is that the delays are shown in increasing order as they go 位 om 
top to bottom. 
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TheADALINE (Adaptive Linear Neuron networks) networks discussed in this 
chapter are similar to the perceptron, but their transfer fanction is linear 
rather than hard-limiting. They make use ofthe LMS (Least Mean Squares) 
learning rule, which is much more powerful that the perceptron learning rule. 
The LMS or Widrow-Hoff learning rule minimizes the mean Square error and， 
thus, moves the decision boundaries as far as it can from the training patterns. 


In this chapter, we design an adaptive linear system that responds to changes 
in its environment as it is operating.Linear networks that are adqjusted at each 
time step based on new input and target vectors can find weights and biasesgs 
that minimize the network”s sum-squared error for recent input and target 
Vectors， 


Adaptive linear filters have many practical applications such as noise 
cancellation, signal processing, and prediction in control and communication 
SystemsS. 


This chapter introduces the function adapt，which changes the weights and 
biases of a network incrementally during training. 


Figures and Equations 


Linear Neuron 


Linear Neuron with 
Input Vector Input 


Where.. 


RR=number of 
elements in 
input vector 





4= Drelin(Wp+D) 





Summary 


Purelin Transfer Function 





QG = DUFeL7(1) 


Linear Transfer Function 


NMADALINE 


Layer of Linear 
Input Neurons 


Input Layer of Linear Neurons 





a= purelin(Wp+b) 


尺 = numberof 
elements in 
input vector 





S=numberof 
a= purelin(Wp+b) neurons in layer 
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ADALINE 


Input Simple ADALINE 





4=DUrelin(YWp+D) 


Decision Boundary 


Q<0 











Mean Square Error 


Q Q 
710se = 》， e( 有 )2 到 >》， (GE 一 a( 有 7 
尺 = 工 尺 = 工 
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Summary 





LMS (Widrow-Hoff) Algorithm 
WE+1T = WO 上 2ae(E)p7(OB) 
b(E+1) = b(P)+2ce(R) 


Tapped Delay Line 
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Adaptive Filter 


Linear Layer 


P(O 
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Summary 





Adapfive Filter Example 


Input “Linear Digital Filter 


AAA 
PO=PG) 






PO=PC-DL) 


PO=Pp(C-92) 


NA 
4=DPDureli(yWp+D) 


Prediction Example 


Input “Linear Digital Filter 


PCD=P(D) Target = Pp(D 









PO=PC-DL) 


PO=PG-2) 


NU AN- 才 
4=DPpurelin(YWp+D) 


Predictive Filter:  a(D is approximation toP(1) 
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Noise Cancellafion Example 







Pilots Voice 
Pilot's Contaminated with Restored Signal 
Voice 时 Engine Noise 六 
一 jp | 


"Error" 
Filtered Noise to Cancel 


Contaminating Contamination 


Noise 


Noise Path 
Filter 





Adaptive 


Engine Noise 人 


Adaptive Filter Adjusts to Minimize Error. 
This removes the engine noise from contaminated 
Signal, leaving the pilot s voice as the “error.” 


Summary 





Mulfiple Neuron Adaptive Filter 


Linear Layer 





Abbreviated Form of Adaptive Filter 


Linear Layer of S Neurons 


Pp( 襄 pd( 避 a( 月 


Cxl (ONxl n(A Syxi 
Sx(OM) 
本 二) Sx1l 区 
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Small Specific Adaptive Filter 


Abreviated Notation 





2 Linear layer 


New Functiions 
This chapter introduces the following nevw function. 





Function Descripfion 





adapt Trains a network using a sequence of inputs 
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Inftroduction 


Today, neural networks can solve problems of economic importance that could 
not be approached previously in any practical way. Some of the recent neural 
network applications are discussed in this chapter. See Chapter 1 for alist of 
many areas where neural networks already have been applied. 





Notfte The rest ofthis chapter describes applications that are practical and 
make extensive use of the neural network fanctions qdqescribed in this User's 
Guide. 





Application Scripts 


The linear network applicationgs are contained in scripts applin1 and applLin2. 


The Elman network amplitude detection application is contained in the Script 
appelm1. 


The character recognition application is in appcr1. 


Type help nndemos to see alisting of all neural network demonstrations or 
applications. 


Applin1: Linear Design 





Applin1: Linear Design 


Problem Definition 
Here is the definition of a signal T, which lasts 5 seconds, and is defined at a 
sampling rate of 40 samples per second. 
time = 0:0.025:5; 
T=Sin(tmex4*pI) 
Q = length(T) ; 
At any given time step,the network is given the last five values ofthe signal t， 


and expected to give the next value. The inputs P are found by delaying the 
signal T 位 om one to five time steps. 


Here is a plot ofthe signal T. 


Signal to be Predicted 
T T T 




















Target Signal 
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Network Design 


Because the relationship between past and future values ofthe signal is not 
changing, the network can be designed directly 他 om examples using new1lind. 


The problem as defined above has five inputs (the five delayed signal values)， 
and one output (thenext signal value).Thus,thenetwork solution must consist 
of a single neuron with five inputs. 


Input Linear Neuron 





4=DUrelin(YPp +D) 


Here newlind finds the weights and biases, for the neuron above, that 
minimize the sum-squared error for this problem. 
net = newlind(P,T) ; 


The resulting network can now be tested. 


Network Tesfting 


To test the network, its output a is computed for the five delayed signals P and 
compared with the actual signal T. 


a= Sim(net,，P); 


Here is a plot of a compared to T. 


Applin1: Linear Design 





Output and Target Signals 
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The network's output a and the actual signal t appear to match up perfectly. 
Just to be sure, let us plot the error e =T-a. 


Error Signal 
T 





Error 
局 
加 





























0 人 5 1 1.5 2 人 启 汪汪 4 4.5 5 

The network did have some error for the first few time steps. This occurred 
because the network did not actually have five delayed signal values available 
until theffth time step. However, afterthefifth time step error was negligible. 
The linear network did a good job. Run the script applLin1 to see these plots. 
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Thoughts and Conclusions 


While newlind is not able to return a zero erTror solution for nonlinea7 
problems, it does minimize the sum-squared error. In many cases, the solution， 
while not perfect, may model a nonlinear relationship well enough to meet the 
application Specifications. Giving the linear network many delayed signal 
values gives it more information with which to find the lowest error linear fit 
for a nonlinear problem. 


Of course, 这 the problem is very nonlinear and/or the desired error is very low， 
backpropagation or radial basis networks would be more appropriate. 


Applin2: Adaptive Prediction 





Applin2: Adapftive Prediction 


In application Script applin2, a linear network is trained incrementally with 
adapt to predict a time series. Because the linear network is trained 
incrementally, it can respond to changes in the relationship between past and 
future values of the signal. 


Problem Definifion 


The signalTto be predicted lasts 6 seconds with a sampling rate of 20 samples 
per second. However, after 4 seconds the signal's 位 equency suddenly doubles. 


time1 = 0:0.05:4; 
time2 = 4.05:0.024:6; 
time = [time1 time2]; 
T= [sin(time1*4*pi) Sin(time2*8*pi)]; 
Since we are training the network incrementally, we change t to a sequence. 


T = con2sedq(T) ; 


Here is a plot ofthis signal. 
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The input to the network is the same Signal that makes up the target. 
P = 了 工 ; 
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Network Inifializafion 

The network has only one neuron, as only one output value ofthe signal T is 
being generated at each time step. This neuron has five inputs, the five delayed 
values of the signal T. 


Linear Layer 





The function newlin creates the network shown above. We use a learning rate 
of 0.1 for incremental training. 

lIFr=0.1; 

delaySsS = [12345]; 

net = newlin(minmax(cat(2,，P{:}))，,1,delays,lLr) 

[w,b] = initlin(P,t) 


Network Training 


The aboveneuron is trained incrementally with adapt. Here is the code to train 
the network on input/target signals P and T. 


[net,a,e]=adapt(net,P,T) ; 


Network Tesfting 


Once the network is adapted, we can plot its output signal and compare it to 
the target signal. 


Applin2: Adaptive Prediction 





Output and Target Signals 
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Initially, it takes the network 1.5 seconds (30 samples) to track the target 
Signal. Then, the predictions are accurate until the fourth second when the 
target signal suddenly changes 位 equency. However, the adaptive network 
learns to track the new signal in an even Shorter interval as it has already 
learned a behavior (a sine wave) similar to the new signal. 


Aplot ofthe error Signal makes these effects easier to see. 
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Thoughts and Conclusions 


The linear network was able to adapt very quickly to the change in the target 
signal. The 30 samples required to learn the wave form are very impressive 
when one considers that in atypical signal processing application, a signal may 
be sampled at 20 kHz. At such a sampling 他 equency, 30 samples go by in 1.5 
Imilliseconds. 


For example, the adaptive network can be monitored so as to give a warning 
that its constants were nearing values that would result in instability， 


Another use for an adaptive linear model is suggested by its ability to find a 
minimum sum-squared error linear estimate ofa nonlinear System's behavior. 
An adaptive linear model is highly accurate as long as the nonlinear System 
staysnear a given operating point. Ifthe nonlinear System moves to a different 
operating point, the adaptive linear network changes to model it at the new 
point. 


The sampling rate should be high to obtain the linear model of the nonlinear 
System at its current operating point in the shortest amount oftime. However， 
there is a minimum amount oftime that must occur for the network to see 
enough of the system”s behavior to properly model it. To minimize this time, a 
small amount ofnoise can be added to the input signals ofthe nonlinear 
System. This allows the network to adaptfaster as more ofthe operating points 
dynamics are expressed in a shorter amount of time. Of course, this noise 
should be small enough so it does not affect the system's usefulness. 


Appelm1: Amplitude Detecfion 





Appelm1: Amplitude Detection 


了 Iman networks can be trained to recognize and produce both spatial and 
temporal patterns. An example of a problem where temporal patterns are 
recognized and classified with a spatial pattern is amplitude detection, 


Amplitude detection requires that a wave form be presented to a network 
through time, and that the network output the amplitude ofthe wave form. 
This is not a difficult problem, but it demonstrates the Elman network design 
Drocess， 


The following material describes code that is contained in the demonstration 
Script appelm1. 


Problem Definifion 


The following code defines two sine wave forms, one with an amplitude of 1.0， 
the other with an amplitude of 2.0. 


Sin(1:20) ; 
Sin(1:20)*2; 


p1 
p2 


The target outputs for these wave forms is their amplitudes， 
t1 = ones(1,20); 
t2 = ones(1,20)*2; 


These wave forms can be combined into a sequence where each wave form 
occurs twice. These longer wave forms are used to train the 了 上 lman network. 


p= [pl1 p2 pl1 p2]; 
t = [tl t2 tl t2]; 


We want the inputs and targets to be considered a sequence, So we need to 
make the conversion 位 om the matrix format. 


Psedq = con2seq(p) ; 
Tseq = con2sedq(t) ; 


Nefwork Inifializafion 


This problem requires that the 卫 ]man network detect a single value (the 
signal), and output a single value (the amplitude), at each time step. Therefore 
the network must have one input element, and one output neuron. 
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R=1; 尖 1 input element 
S2=1;%1 1layer 2 output neuron 


The recurrent layer can have any number of neurons. However, as the 
complexity of the problem grows, more neurons are needed in the recurrent 
layer for the network to do a good job. 


This problem is fairly simple, so only 10 recurrent neurons are used in the first 
]ayer. 


S1 = 10;% 10 _ recurrent neurons in the first Layer 


Novw the faonction newelm can be used to create initial weight matrices and bias 
Vectors for a network with one input that can vary between -2 and +2. We use 
variable learning rate (traingdx) for this example. 


net = newelm([-2 2],[S1 S2],{ tansig ，purelin' +，traingdx ) ; 


Network Training 
Novw call train. 


[net,tr]l = train(net,Pseq,Tsedq) ; 


As this function finishes training at 500 epochs, it displays the following plot 
of errors. 


Mean Squared Error of ElIman Network 
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Mean Squared Error 


[机 











Appelm1: Amplitude Detiection 





The final mean-squared erTror was about 1.8e-2. We can test the network to see 
what this means. 


Neftwork Tesfting 
To test the network, the original inputs are presented, and its outputs are 
calculated with simuelm. 


a= Simn(net,Psedq) 


Here is the plot. 


Testing Amplitute Detection 
T T T 
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Target - - Output -- 














Time Step 


The network does a good job. New wave amplitudes are detected with a few 
samples. More neurons in the recurrent layer and longer training times would 
result in even better performance, 

The network has successfully learned to detect the amplitudes ofincoming sine 
Waves， 


Nefwork Generalizafion 


Of course, even 过 the network detects the amplitudes of the training wave 
forms, it may not detect the amplitude of a sine wave with an amplitude it has 


not Seen before. 


11-13 


11 Applicalions 





11-14 


The following code defines anew wave form made up oftworepetitions of a sine 
wave with amplitude 1.6 and another with amplitude 1.2. 

p3 = sin(1:20)*1.6; 

t3 = ones(1,20)*1.6; 

p4 = Sin(1:20)*1.2; 

t4 = ones(1,20)*1.2; 

pg = [p3 p4 p3 p4] 

tg = [t3 t4 t3 t4] ; 

pgseq = Con2seq(pg) ; 


The input sequence pg and target sequence tg are used to test the ability ofour 
network to generalize to new amplitudes. 


Once again the function sim is used to simnulate the Elman network and the 
results are plotted. 


a= Sim(net,pgsed) ; 


Testing Generalization 
T 
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This time the network did not do as well. It seems to have a vague idea as to 
what it should do, but is not very accuratel 


Improved generalization could be obtained by training the network on more 
amplitudes than just 1.0 and 2.0. The use ofthree or four different wave forms 
with different amplitudes can result in a much better amplitude detector. 
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Improving Performance 

Run appelm1 to see plots similar to those above. Then make a copy of this file 
and try imnproving the network by adding more neurons to the recurrent layer， 
using longer training times, and giving the network more examples in its 
training data. 
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Lt is often useful to have a machine perform pattern recognition. In particular， 
machines that can read Symbols are very cost effective. A machine that reads 
banking checks can process many more checks than ahumanbeing in the same 
time. This kind of application saves time and money, and eliminates the 
requirement that ahuman perform such a repetitive task. The Script appcr1 
demonstrates how character recognition can be done with a backpropagation 
Detwork. 


Problem Statement 

Anetwork is to be designed and trained to recognize the 26 letters of the 
alphabet. An imaging system that qigitizes each letter centered in the system's 
field ofvision is available. The result is that each letter is represented as a5 by 
7 grid of boolean values， 


For example, here is the letter A. 


XXX 





However, the imaging system is not perfect and the letters may suffer 位 om 
nolse. 


Appcr1: Character Recognition 








Perfect classification ofideal input vectors is required, and reasonably accurate 
classification of noisy vectors. 


The twenty-six 35-element input vectors are defined in the function prprob as 
amatrix of input vectors called alphabet. The target vectors are also defined 
in this fle with a variable called targets. Each target vector is a 26-element 
vector with a 1 in the position ofthe letter it represents, and 0's everywhere 
else. For example, the letter Ais to be represented by a l in the first element 
(as A is the first letter of the alphabet), and 0's in elements two through 
twenty-SlX， 


Neural Network 


The network receives the 35 Boolean values as a 35-element input vector. It is 
then required to identify the letter by responding with a 26-element output 
Vector. The 26 elements ofthe output vector each represent a letter. To operate 
correctly,the network should respond with a l in the position ofthe letter being 
presented to the network. All other values in the output vector should be 0. 


In addition, the network should be able to handle noise. In practice, the 
network does not receive a perfect Boolean vector as input. Specifically, the 
network should make as few mistakes as possible when classijfying vectors with 
noise of mean 0 and standard deviation of 0.2 or less. 


Architecture 


The neural network needs 35 inputs and 26 neurons in its output layer to 
identify the letters. The network is a two-layer 1og-Ssigmoid/1og-sigmoid 
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network. The log-sigmoid transfer function was picked because its output 
range (0 to 1) is perfect for learning to output boolean values. 


Input Hidden Layer Output Layer 


10 26 x1 





al = logsig (IW1Dp1 +b) 32 = /0o8518(LVWV21al+b2) 


The hidden (first) layer has 10 neurons. This number was picked by guesswork 
and experience. Ifthe network has trouble learning,then neurons can be added 
to this layer. 


The network is trained to output a lin the correct position ofthe output vector 
and to fill the rest of the output vector with 0?s. However, noisy input vectors 

mayresult in the network not creating perfect ls and 0's. After the network is 
trained the output is passed through the competitive transfer function compet. 
This makes sure thatthe output corresponding to the letter most like the noisy 
input vector takes on avalue of 1, and all others have a value of 0. The result 
of this post-processing is the output that is actually used. 


Initializafion 
The two-layer network is created with newff. 
S1 = 10| 
[R,Q] = Size(alphabet) ; 
[S2,Q] = Size(targets ) ; 
P = alphabet ; 
net = newff(minmax(P),[S1 S2],{'1ogsig ”  '`1ogsig }，traingdx ) ; 


Training 

To create a network that can handle noisy input vectors it is best to train the 
network on both ideal and noisy Vectors. To do this, the network is first trained 
on ideal vectors until it has a low sum-squared erTror. 
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Then, the network is trained on 10 sets ofideal and noisy vectors. The network 
is trained on two copies of the noise-free alphabet at the same time as jit is 
trained on noisy vectors. The two copies of the noise-free alphabet are used to 
maintain the network's ability to classify ideal input vectors. 


Unfortunately, after the training described above the network may have 
learned to classjfy some difficult noisy vectors at the expense of properly 
classifying a noise-free vector. Therefore, the network is again trained on just 
ideal vectors. This ensures that the network responds perfectly when 
presented with an ideal letter. 


All training is done using backpropagation with both adaptive learning rate 
and momentum with the function trainbpx. 


Training Without Noise 


The network is initially trained without noise for a maximum of 5000 epochs 
or until the network sum-squared error falls beneath 0.1. 


P = alphabet ; 
T = targets; 
net.performFcn = 'Sse | 


net.trainParam.goal = 0.1; 
net.trainParam.Sshow = 20 
net.trainParam.epochs = 5000 
net .trainParam.mc = 0.95 
[net,tr]l = train(net,P,T) ; 


Training with Noise 

To obtain anetwork not sensitive to noise, we trained with twoideal copies and 
two noisy copies ofthe vectors in alphabet. The target vectors consist of four 
copies of the vectors in target. The noisy vectors have noise of mean 0.1 and 
0.2 added to them. This forces the neuron to learn hovw to properly identify 
Doisy letters, while requiring that it can still respond well to ideal vectors. 


To train with noise, the maximum number ofepochs is reduced to 300 and the 
erTror goal is increased to 0.6, reflecting that higher error is expected because 
more vectors (including some with noise), are being presented. 


netn = netj 
netn.trainParam.goal = 0， 


6 ; 
netn.trainParam.epochs 300 ; 
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T= [targets targets targets targets ] ; 
for pass = 1:10 
P = [alphabet，alphabet， 
(alphabet + randn(R,Q)*0.1)， 
(alphabet + randn(R,Q)*0.2)]; 
[netn,tr] = train(netn,P,T) ; 
end 


Training Without Noise Again 

Once the network is trained with nolise, it makes sense to train it without noise 
once more to ensure that ideal input vectors are always classified correct1y. 
Therefore, the network is again trained with code identical to the "Training 
Without Noise" section. 


System Performance 


The reliability of the neural network pattern recognition system is measured 
by testing the network with hundreds ofinput vectors with varying quantitiegs 
of noise. The script file appcr1 tests the network at various noise levels, and 
then graphs the percentage ofnetwork errors versus noise. Noise with a mean 
of 0 and a standard deviation from 0 to 0.5 is added to input vectors. At each 
Doise level, 100 presentations of different noisy versions of each letter are made 
and the network's output is calculated. The output is then passed through the 
competitive transfer function so that only one ofthe 26 outputs (representing 
the letters ofthe alphabet), has a value of 1. 


The number of erroneous classificationgs is then added and percentages are 
obtained. 
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Percentage of Recognition Errors 
T T T 


Network 2 一 - 


六 
凶 
T 


Network 1 - - 














1 1 1 1 1 1 
0 0.05 0.1 0.15 0.2 0.3 0.35 0.4 0.45 0.5 


0.25 
Noise Level 


The solid line on the graph shows the reliability for the network trained with 
and without noise. The reliability ofthe same network when it had only been 
trained without noise is shown with a dashed line. Thus, training the network 
on noisy input vectors greatly reduces its errors when it has to classify noisy 
Vectors， 


The network did not make any errors for vectors with noise of mean 0.00 or 
0.05. When noise of mean 0.2 was added to the vectors both networks began 
making erTors. 


了 Ia higher accuracy is needed, the network can be trained for a longer time or 
retrained with more neurons in its hidden layer. Also, the resolution of the 
input vectors can be increased to a 10-by-14 grid. Finally, the network could be 
trained on input vectors with greater amounts of noise 放 greater reliability 
were needed for higher levels of noise. 


To test the system, a letter with noise can be created and presented to the 
Detwork. 


noisyJ = alphabet(:,10)+trandn(35,1) * 0.2; 
plotchar(noisyJy ) ; 

A2 = Slim(net,noisydy ) ; 

A2 = Compet(A2 ) ; 

anSswer = find(compet(A2) == 1); 
plotchar(alphabet(: ,answer) ) ; 


11 Applicalions 





11-22 


Here is the noisy letter and the letter the network picked (correctly). 





Summary 


This problem demonstrates how a simple pattern recognition system can be 
designed. Note that the training process did not consist of a single call to a 
training function. Instead, the network was trained several times on various 
input Vectors 


In this case, training a network on different sets of noisy Vectors forced the 
network to learn hovw to deal with noise, a common problem in the real world. 
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Custom Networks 


The Neural Network Toolbox is designed to allovw for many kinds of networKks. 
This makes it possible for many fanctions to use the same network object data 


type. 


Here are all the standard toolbox network creation functions. 





New Networks 





newC 
newcf 
newelnm 
newf 和 
newfftd 
newgrnn 
newhop 
newlin 
newlind 
newJVvq 
newp 
newpnn 
newrb 
newrbe 


newSonm 


Create a competitive layer. 

Create a cascade-forward backpropagation network. 
Create an 了 lman backpropagation network. 

Create a feed-forward backpropagation network. 
Create a feed-forward input-dqelay backprop network. 
Design a generalized regression neural network. 
Create a Hopfield recurrent networkK. 

Create a linear layer. 

Design a linear layer. 

Create a learning vector quantization network 
Create a perceptron. 

Design a probabilistic neural network. 

Design a radial basis network. 

Design an exact radial basis network. 


Create a self-organizing map， 





This flexibility is possible because we have an object-oriented representation 
for networks. The representation allows various architectures to be defined 
and allowsgs various algorithms to be assigned to those architectures. 
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To create custom networks, start with an empty network (obtained with the 
network function) and set its properties as desired. 


network  - Create a _ custom neural network . 
The network object consists of many properties that you can set to specify the 


Structure and behavior of your network. See Chapter 13, " for descriptiongs of 
all network properties. 


The following sections demonstrate how to create a custom network by using 
these properties， 


Custom Nefwork 


Before you can build a network you need to know what it looks like. For 
dramatic purposes (and to give the toolbox a workout) this section leadqs you 
through the creation ofthe wild and complicated network shown below. 


Inputs Layers 1 and 2 Layer 3 Outputs 


Pi( 名 





3X(1#5) 


ee Re 


a2z(0 = logsig(TVW21[pI(D;pID]+IVW22pz(D)) as(O=purelin(LW3,3as( 二 1)+IVWV31 al (Of+bs+LWV32a2 d) 


卫 ach ofthe two elements ofthe first network input is to accept values ranging 
between 0 and 10.Each ofthe five elements ofthe secondnetwork inputTranges 
位 om -2 to 2. 
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Before you can complete your design ofthis network, the algorithms it employs 
for initialization and training must be specified. 


We agree here that each layer's weights and biases are initialized with the 
Nguyen-Widrow layer initialization method (initnw). Also, the network is 
trained with the Levenberg-Marquardt backpropagation (trainlm), so that， 
given example input vectors, the outputs ofthe third layer learn to match the 
associated target vectors with minimal mean squared erTror (mse). 


Network Definifion 


The first step is to create anew network.Type in the following code to create a 
network and view its many properties. 


net = network 


Archifecture Properfies 


The first group of properties displayed are labeled architecture properties. 
These properties allow you to select of the number of inputs and layers, and 
their connections. 


Number of Inputs and Layers.， The first two properties displayed are numInputs 
and numLayers. These properties allow us to select how many inputs and layers 
we want our network to have. 


net = 
Neural Network object : 
architecture : 


numInputs: 0 
numLayers: 0 


Note that the network has no inputs or layers at this time， 


Changethat by setting these properties to thenumber ofinputs andnumber of 
layers in our custom network diagram， 


net.numInputs = 2; 
net .numLayers = 3 
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Note that net.numInputs is the number ofinput sources, not the number of 
elements in an input vector (net.inputs{i}.size). 


Bias Connections. Type net and press Return to view its properties again. The 
network now has two inputs and three layers. 


net = 
Neural Network object : 
architecture : 


numInputs: 2 
numLayers: 3 


Now look at the next five properties. 


biasConnect: [0; 0; 0] 
InputConnect: [0 0; 0 0; 0 0] 
1ayerConnect: [000;j000;000] 

outputConnect: [0 0 0] 
targetConnect: [0 0 0] 


These matrices of 1s and 0's represent the presence or absence of bias, input 
weight, layer weight, output, and target connections. They are currently all 
Zeros, indicating that the network does not have any Such connections， 


Note that the bias connection matrix is a 3-by-1 vector. To create a bias 
connection to the ith layer you can set net.biasConnect(i) to 1.Specify that 
the first and third layer's are to have bias connections, as our diagram 
indicates, by typing in the following code. 


net .biasConnect(1) = 1; 
net .biasConnect(3) = 1 


Note that you could also define those connections with a single line of code. 


net.biasCconnect = [1; 0; 1]; 


Input and Layer Weight Connections， The input connection matrix is 3-by-2， 
representing the presence of connections 位 om two Sources (the two inputs) to 
three destinationgs (the three layers). Thus, net.inputConnect(I,j) 
represents the presence of an input weight connection going to the ith layer 
位 om the th input. 
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To connect the first input to the first and second layers, and the second input 
to the second layer (as is indicated by the custom network diagram)j, type 


net.inputConnect(1,1) = 1 
net.inputConnect(2,1) = 1; 
net.inputConnect(2,2) = 1; 


or this single line of code: 
net.inputConnect = [10;j 1 1;00]; 
Similarly, net. layerConnect(I.j) represents the presence of a layer-weight 


connection going to the ;th layer from the th layer. Connect layers 1, 2, and 3 
to layer 3 as follows. 


net.1ayerConnect = [000;j000;111]; 
Outiput and Target Connections， Both the output and target connection matrices 


are 1-by-3 matrices, indicating that they connect to one destination (the 
external world) 位 om three sources (the three layers). 


To connect layers 2 and 3 to network outputs, type 


net.outputCconnect = [0 1 1]; 


To give layer 3 a target connection, type 


net.targetConnect = [0 0 1]; 


The layer 3 target is compared to the output oflayer 3 to generate an errTror for 
use when measuring the performance of the network, or when updating the 
network during training or adaption. 


Number of Outputs and Targets 

Type net and press Enter to view the updated properties. The final four 
architecture properties are read-only values, which means their values are 
determined bythe choices we make for other properties. The first tworead-only 
properties have the following values. 


numoutputs: 2 (read-only) 
numTargets: 1 (read-only) 


By defining output connectiongs from layers 2 and 3, and a target connection 
位 om layer 3, you specify that the network has two outputs and one target. 
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Subobiect Properties 
The next group of properties ig 


Subobject Structures : 


Inputs: {2x1 cel1l} of inputs 
1ayerSs: {3x1 cel1} of layers 
outputs: {1x3 cel1} containing 2 outputs 
targets: {1x3 cel1} containing 1 target 
biases: {3x1 cel1} containing 2 biases 
inputWeights: {3x2 cel1} containing 3 input weights 
LayerWeights: {3x3 cel1} containing 3 layer weights 


Inputs 


When you set the number ofinputs (net.numInputs) to 2, the inputs property 
becomes a cell array of two input structures. Each ;th input structure 
(net.inputs{fi}) contains addition properties associated with the zth input. 


To see how the input structures are arranged, type 


net.inputs 
ans = 


[1x1 struct] 
[1x1 struct] 


To see the properties associated with the first input, type 
net.inputs{1} 

The properties appear as follows. 
ans = 


range: [0 1] 
Size: 1 
USserdata: [1x1 struct] 


Note that the range property only has one row. This indicates that the input 
has only one element, which varies 位 om 0 to 1. The size property also 
indicates that this input has just one element. 
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The first input vector ofthe custom network is to have two elements ranging 
位 om 0 to 10. Specify this by altering the range property ofthe first input as 
follows. 


net.inputs{1}y.range = [0 10; 0 10]; 


If we examine the first input's Structure again, we see that it now has the 
correct Size, which was inferred 位 om the new range values. 


ansS = 


range: [2x2 double] 
S1ze: 2 
USserdata: [1x1 struct] 


Setthe second input vector ranges to be from -2to2forfive elements as follows. 


net.inputs{2}.range = [-22; -22j -22j -22; -2 2]; 


Layers，When we set the number of layers (net.numLayers) to 3, the layers 
property becomes a cell array ofthree-layer Structures. Type the following line 
of code to see the properties associated with the first layer. 


net.1ayers{1} 
ans = 


dimensions: 1 
distanceFcn: 'dist'， 
distances: 0 
InitFcn: "initwb ， 
netInputFcn: "netSsum' 
positions: 0 
Size: 1 
topologyFcn: '“hextop， 
transferFcn: "purelin' 
USserdata: [1x1 struct] 


Type the following three lines of code to change the first layer”s size to 4 
neurons, its transfer fuanction to tansig, and its initialization function to the 
Nguyen-Widrow function as required for the custom network diagram. 


net.1ayers{1}.Size = 4; 
net.1ayers{1}.transferFcn = tansig 
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net.1ayers{1}.initFcn = initnw ; 


The second layer is to have three neurons, the 1ogsig transfer function, and be 
initialized with initnw. Thus, set the second layer's properties to the desired 
values as follows. 


net.1ayers{2}.Size = 3; 
net.1ayers{2}.transferFcn = 10gsig 
net.1ayers{2}.initFcn = initnw ; 


The third layers size and transfer fanction properties dont need to be changed 
since the defaults match those shown in the network diagram. You only need 
to set its initialization function as follows. 


net.1ayers{3}.initFcn = initnw ; 
Output and Targets.， Take a look at hovw the outputs property is arranged with 
this line of code. 


net.outputs 
ans = 


[] [1x1 struct] [1x1 struct] 


Note that outputs contains two output structures, one for layer 2 and one for 
layer 3. This arrangement occurs automatically when net,outputConnect Was 
set to [0 1 1]. 


View the second layer'gs output structure with the following expression . 


net.outputs{21} 
ans = 
Size: 3 
USserdata: [1x1 struct] 


The size is automatically set to 3 when the second layer's Size 
(net.1ayers{2}.size) is set to that value. Take a look at the third layer's 
output structure 让 you want to verify that it also has the correct Size. 


Similarly, targets contains one Structure representing the third layers target. 
Type these two lines of code to see how targets is arranged and to view the 
third layer's target properties. 


net.targets 
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[] [] [1x1 struct] 


net.targets{3} 
ans = 
Sl1ze: 1 
USserdata: [1x1 struct] 


Biases, Input Weights, and Layer Weights，Enter the following lines of code to see 
how bias and weight structures are arranged. 


net.biases 
net. inputWeights 
net.1ayerWeights 


Here are the results for typing net.biases. 


ans = 
[1x1 struct] 


[] 
[1x1 struct] 


芒 you examine the results you will note that each contaings a Structure where 
the corresponding connectiongs (net.biasConnect, net.inputConnect, and 
net.1ayerConnect) contain a 1 


Take a look at their structures with these lines of code. 


net.biases{1l} 

net.biases{3} 

net.inputWeights{1 ,1} 
net.inputWeights{2,1} 
net.inputWeights{2,2} 
net.1ayerWeights{3，,11} 
net.1ayerWeights{3,2} 
net.1ayerWeights{3,31} 


For example, typing net.biases{1} results in the following output. 


ans = 
InitFcn: 
Learn: 1 
LearnFcn: 
LearnParanm : 
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Slize: 4 
USserdata: [1x1 struct] 


Specify the weights tap delay lines in accordance with the network diagram, by 
setting each weights delays property， 


net.inputWeights{2,1}.delays = [0 1]; 
net.inputWeights{2,2}.delays 1 ; 
net.1ayerWeights{3,3}.delays = 1; 


Network Funcfions 
Type net and press return again to see the next set of properties. 


functions : 


adaptFcn: (none) 
initFcn: (none) 
performFcn: (none) 
trainFcn: (none) 
卫 ach ofthese properties defines a fanction for a basic network operation . 


Set the initialization function to initlay so the network initializes itself 
according to the layer initialization functions that we have already set to 
initnw the Nguyen-Widrow initialization function. 


net.initFcn = "initlay ; 
This meets the initialization requirement of our network. 


Set the performance function to mse (mean squared error) and the training 
fanction to trainlm(Levenberg-Marquardt backpropagation) to meet the final 
requirement of the custom network， 


net.performFcn = :mse'; 
net.trainFcn = trainlm' ; 


Weight and Bias Values 


Before initializing and training the network, take alook at the final group of 
network properties (aside 他 om the userdata property). 


Weight and bias values : 


IW: {3x2 cel1} containing 3 input weight matrices 
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LW: {3x3 cel1L} containing 3 layer weight matrices 
b: {3x1 cel1} containing 2 bias Vectors 


These cell arrays contain weight matrices and bias vectors in the same 
positions that the connection properties (net.inputConnect， 
net.1ayerConnect, net.biasConnect) contain ls and the subobject properties 
(net.inputWeights, net.LlayerWeights, net.biases) contain Structures， 


了 Evaluating each ofthe following lines of code reveals that all the bias vectors 
and weight matrices are Set to zeros， 


net.IW{1,1}，net.IW{2;,1}，net.IW{2,2} 
net.IW{3,1}，net.LW{3;,2}，net.LW{3,3} 
net.b{1}，net.b{3} 


卫 ach input weight net.IW{i,j}, layer weight net.LW{Ii,j}, and bias vector 
net.b{fiy has as many rowsgs as the size ofthe zth layer (net.1layers{i}.size). 


卫 ach input weight net.IW{i,j} has as many columns as the size ofthe th 
input (net.inputs{j}.size) multiplied by the number of its delay values 
(Length(net.inputWeights{fi,j}.delays)). 


Likewise, each layer weight has as many columns as the Size of the th layer 
(net.layers{j}.size) multiplied by the number of its delay values 
(Length(net.1LayerWeights{fi,j}.delays)). 


Neftwork Behavior 


Initializafion 
Initialize your network with the following line of code. 


net = Init(net) 


Reference the network's biases and weights again to see hovw they have 
changed. 


net.IW{1,1}，net.IW{2,1}，net.IW{2,2} 
net.IW{3,1}，net.LW{3;,2}，net.LW{3,3} 
net.b{1}，net,.b{f3} 


For example，, 


net.IW{1,1} 
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ans = 
-0.3040 0.4703 
-0.5423 -0.1395 
0.5567 0.0604 
0.2667 0.4924 
Training 


Define the following cell array oftwo input vectors (one with two elements, one 
with five) for two time steps (ie., two columns). 


P={0;0] [2; 0.5]; [2 -2; 1 0 1] [1 -1 1 0 1])} 
We want the network to respond with the following target sequence. 

T={l -1} 
Before training, we can simulate the network to see whether the initial 
network”s response Y is close to the target 工 . 

Y= Sim(net,P) 

Y = 


[3x1 double] [3x1 double] 
[ 0.0456] [ 0.2119] 


The second row ofthe cell array Yis the output sequence ofthe second network 
output, which is also the output sequence ofthe third layer. The values you got 
for the second row may differ from those shown due to different initial weights 
and biases. However, they will almost certainly not be equal to our targets 工 ， 
which is also true of the values shown. 


The next task is to prepare the training parameters. The following line ofcode 
displays the default Levenberg-Marquardt training parameters (which were 
defined when we set net.trainFcn to train1m). 


net .trainParam 
The following properties should be displayed. 
ans = 


epochs: 100 
goal: 0 
max_fail: 5 
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mem_reduc: 1 
min_grad: 1.0000e-10 
mu: 1.0000e-03 
mu_dec: 0.1000 
mu_inc: 10 
mu_max: 1.0000e+10 
Show: 25 
七 ime : 


Change the performance goal to 1e-10. 
net.trainParam.goal = 1e-10; 

Next, train the network with the following cal!. 
net = train(net,P,T) 


Below is a typical training plot. 


Performance is 3.91852e-16, Goal is 1e-10 
10 1 i i i i 








Training-Blue Goal-Black 











2 
4 Epochs 


After training you can Simulate the network to see iitphas learned to respond 
correctly. 


Y=Sim(net,P) 
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[3x1 double] [3x1 double] 
[ 1.0000] [ -1.0000] 


Note that the second network output (ie., the second row of the cell array Y)， 
which is also the third layers output, does match the target sequence T. 
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Most toolbox fanctiongs are explained in chapters dealing with networks that 
use them. However, some functiongs are not used by toolbox networks, but are 
included as they may be useful to you in creating custom networks. 


卫 ach ofthese is documented in “Reference” in Chapter 14. However, the notes 
given below may also prove to be helpful. 


Inifializafion Funcfions 


randnc 


This weight initialization fanction generates random weight matrices whose 
columns are normalized to a length of |. 


randnr 


This weight initialization function generates random weight matrices whose 
rowsgs are normalized to a length of 1|. 


Transfer Funcfions 


sailin 


This transfer function is similar to satlins,buthas alinear region going 位 om 
0to 1 (instead of -1to 1), and minimum and maximum values of0 and 1 
(instead of -1 and 1). 


softmax 


This transfer function is a softer version of the hard competitive transfer 
fonction compet. The neuron with the largest net input gets an output closest 
to one, while other neurons have outputs close to zero. 


fribas 


The triangular-basis transfer function is similar to the radial-basis transfer 
fonction radbas, but has a Simpler shape, 


Additional Toolbox Functions 





Learning Funcfions 


learnh 


The Hebb weight learning function increases weights in proportion to the 
product, the weights input, and the neuron's output. This allows neurons to 
learn associations between their inputs and outputs. 


learnhd 


The Hebb-with-dqecay learning function is similar to the Hebb function, but 
adds a term that decreases weights each time step exzponentially. This weight 
decay allows neurons to forget associationgs that are not reinforced regularly， 
and solvegs the problem that the Hebb function has with weights growing 
without bounds. 


learnis 

The instar weight learning function moves a neuron”s weight vector towardsgs 
the neuron's input vector with steps proportional to the neuron's output. This 
fanction allows neurons to learn association between input vectors and their 
outputs . 


learnos 

The outstar weight learning fanction acts in the opposite way as the instar 
learning rule. The outstar rule moves the weight vector coming 位 om an input 
toward the output vector of a layer of neurons with step Sizes proportional to 


the input value. This allows inputs to learn to recall vectors when stimulated. 
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The toolbox allows you to create and use many kinds of functions. This gives 
you a great deal of control over the algorithms used to initialize, simulate, and 
train; and allow adaption for your networks， 


The following sections describe hovw to create your own versiongs of these kinds 
of functions: 
e。 Simulation functiongs 
= transfer functions 
= net input fanctions 
= Weight functiongs 
e。 Initialization functiongs 
= network initialization functiongs 
= layer initialization functions 
= Weight and bias initialization fanctions 


e Learning functiongs 


network training functiongs 


network adapt functiongs 


network performance functions 


weight and bias learning fanctions 
e Self-organizing map functiongs 
= topology functions 


= distance functions 


Simulaftion Funcfions 


You can create three kinds of simulation functions: transfer, net input, and 
weight functions. You can also provide associated derivative fanctions to 
enable backpropagation learning with your fanctions. 


Transfer Functions 


Transfer functiongs calculate a layers output vector (or matrix) A, given its net 
input vector (or matrix) N. The only constraint on the relationship between the 


Custom Functions 





output and net input is thatthe output must have the same dimensions as the 
input. 


Once defined, you can assign your trangsfer function to any layer of a networkK. 
For example, the following line of code assigns the transfer function yourtf to 
the second layer of a network. 


net.1ayers{2}.transferFcn = yourtf '; 


Your transfer function then is used whenever you Simulate your networK. 
[Y,Pf,Af] = Sim(net,P,Pi,Ai) 
Tobe avalid transfer function, your function must calculate outputs Afrom net 
inputs N as follows， 
A = yourtf(N) 
Where: 


eNisangSxQmatrxofQnetinput (column) vectors. 
eAisangSxQmatrx ofQ output (column) vectors, 


Your transfer function must also provide information about itself using thigs 
calling format， 


info = yourtf(code) 
where the correct information is returned for each ofthe following string codes: 


e 'version' - Returns the Neural Network Toolbox version (3.0). 

e 'deriv' -Returns the name ofthe associated derivative fanction. 
e 'output ' - Returns the output range, 

e 'active' -Returns the active input range. 


The toolbox contains an example custom transfer function called mytf. 了 nter 
the following lines of code to see hovw it is used. 


help mytf 

n= -5:.1:5; 
a= mytf(n); 
plot(ny,al) 
mytf( ' deriv ') 


12-19 


【2 Topics 





12-20 


Enter the following command to see hovw mytf is imnplemented. 
type mytf 


You can use mytf as a template to create your own trangsfer function. 


Transfer Derivafive Funcftions. Ifyou wantto use backpropagation with your custom 
transfer function, you need to create a custom derivative function for it. The 
fonction needs to calculate the derivative ofthe layers output with respect to 
its net input， 


dA_dN = yourdtf(N;,A) 
where: 


eNisangSxQ matrzxofQnet input (column) vectors. 
eAigangSxQ matrzx ofQ output (column) vectors. 
e dA _dN is the SxQ derivative QAQN. 


This only works for transfer functions whose output elements are independent. 
In other words, where each A(i) is only a function of N(I). Otherwise, a 
three-dimensional array is required to store the derivatives in the case of 
multiple vectors (instead of a matrix as defined above). Such 3-D derivatives 
are not Supported at this time. 


To see how the example custom transfer derivative function mydtf works,type 


heJp mydtf 

da _ dn = mydtf(ny,a) 
Subplot(2,1,1)，plot(nyal) 
Subplot(2,1;,2)，plot(n,dn_ day) 


Use this command to see how mydtf was imnplemented. 


type mydtf 


You can use mydtf as a template to create your own trangsfer derivative 
fonctions. 


Net Inpuf Funcfions 

Net input functions calculate a layers net input vector (or matrix) N, given its 
weighted input vectors (or matrices) Zi. The only constraints on the 
relationship between the net input and the weighted inputs are that the net 
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input must have the same dimensions as the weighted inputs, and that the 
fanction cannot be sensitive to the order of the weight inputs. 


Once defined, you can assign your net input function to any layer of anetwork. 
For example, the following line ofcode assigns the transfer fanction yournif to 
the second layer of a network. 


net.1ayers{2}.netInputFcn = "yournif  ; 


Your net input function then is used whenever you Simulate your network. 
[Y,Pf,Af] = Sim(net,P,Pi,Ai) 
To be avalid net input function your function must calculate outputs A 位 om 
net inputs N as follows， 
N = yournif(Z1,Z2,，...) 
Where: 


ezZziisthezithnSxQ matrx ofQ weighted input (colummn) vectors. 


eNisangSxQ matrxofQnet input (column) vectors. 
Your net input function must also provide information about itself using thigs 
calling format， 
info = yournif(code) 
where the correct information is returned for each ofthe following string codes: 


e 'version' - Returns the Neural Network Toolbox version (3.0). 

e 'deriv' -Returns the name ofthe associated derivative function. 

The toolbox contains an example custom net input function called mynif. 了 nter 
the following lines of code to see hovw it is used. 


help mynif 

zl = rand(4;5) ; 

zZ2 = rand(4;5) ; 

z3 = rand(4,5) ; 

n = mynif(z1,z2,z3) 
mynif('deriv ) 


了 Enter the following command to see how mynif is imnplemented. 


type mynif 
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You can use mynif as a template to create your own net input function. 


Net Input Derivafive Functions，If you want to use backpropagation with your 
custom net input fanction, you need to create a custom derivative function for 
it. It needs to calculate the derivative ofthe layers net input with respect to 
any of its weighted inputs， 


dN_ dZz = dtansig(Z,N) 
Where: 


ezZis one ofthe SxQ matrices of Q weighted input (column) vectors， 
eNisangSxQ matrzx ofQnet input (column) vectors. 
e dN dzis the SxQ derivative QN/QZ. 


To see how the example custom net input derivative fanction mydtf works,type 


heJp mydnif 

dn dz1 = mydnif(zly，n) 
dn dz2 = mydnif(zly，n) 
dn_ dz3 = mydnif(z1yn) 


Use this command to see how mydtf was imnplemented. 


type mydnjif 


You can use mydnif as a template to create your own net input derivative 
fonctions. 


Weight Funcfions 
Weight functiongs calculate a weighted input vector (or matrix) Z, given an 
input vector (or matrices) P and a weight matrix W. 


Once defined, you can assign your weight fanction to any input weight or layer 
weight ofa network. For example, the following line ofcode assigns the weight 
fonction yourwf to the weight going to the second layer 位 om the first input of 
a network. 


net.inputWeights{2,1}.weightFcn = yourwf ; 
Your weight function is used whenever you Simulate your network. 


[Y,Pf,Af] = Sim(net,P,Pi,Ai) 


Custom Functions 





Tobeavalid weightfunction your fanction mustcalculate weightinputs Z from 
inputs P and a weight matrix W as follows， 


Z = yourwf(W,P) 
where: 


eWis an Sx 玉 weight matrix. 
epPisan 尺 xQ matrx ofQ input (column) vectors. 
ezisangSxQ matrx ofQ weighted input (column) vectors. 


Your net input function must also provide information about itself using thigs 
calling format， 


info = yourwf(code) 
where the correct information is returned for each ofthe following string codes: 


e 'version' -Returngs the Neural Network Toolbox version (3.0). 
e 'deriv' - Returngs the name ofthe associated derivative function. 


The toolbox contains an example custom weight called mywf. Enter the 
following lines of code to see how it is used. 


help mywf 

ws= rand(1;5); 
p= rand(5，,1); 
zZ = mywf(w,p) 
mywf( ' deriv ') 


了 


了 Enter the following command to see how mywf is imnplemented. 
type mywf 
You can use mywf asgs a template to create your own weight functions， 


Weight Derivative Functions. If you wantto use backpropagation with your custom 
weight function, you need to create a custom derivative function for it. It needs 
to calculate the derivative of the weight inputs with respect to both the input 
and weight， 


dz_dP = mydwf('p' WP,Z) 
dz_dW = mydwf('w' WP,Z) 
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where: 


eWis an Sx 刃 weight matrix. 

epPisan 尺 xQ matrzx ofQ input (column) vectors. 

ezigangSxQ matrzx ofQ weighted input (column) vectors. 

e dZ_dP is the Sx 丸 derivative QZ/CP. 

e dzZ_dWis the 尺 xQ derivative CZ/qW. 

This only works for weight fanctions whose output consists of a sum of z term， 
Where each ;th term is a fonction of only W(i) and P(I). Otherwise a 
three-dimensional array is required to store the derivatives in the case of 


multiple vectors (instead of a matrix as defined above). Such 3-D derivatives 
are not Supported at this time. 


To see how the example custom net input derivative fanction mydwf works, type 


heJp mydwf 
dz _ dp = mydwf(' pwp,z) 
dz _dw = mydwf('wW' wp zZ) 


Use this command to see how mydwf is implemented. 
type mydwf 


You can use mydwf as a template to create your own net input derivative 
fonction. 


Inifializafion Funcfions 


You can create three kinds of initialization fanctions: network, ljayer, and 
weight/bias initialization. 


Network Inifializaftion Funcfions 


The most general kind of initialization function is the network initialization 
fonction which sets all the weights and biases of a network to values suitable 
as a starting point for training or adaption. 


Once defined, you can assign your network initialization function to a network. 


net.iInitFcn = ' yournif '; 


Custom Functions 





Your network initialization function is used whenever you initialize your 
Detwork. 


net = Init(net) 


To be a valid network initialization function, it must take and return 8a 
Detwork. 


net = yournif(net) 


Your function can set the network's weight and bias values in any way you 
want. However, you should be careful not to alter any other properties, or to set 
the weight matrices and bias vectors of the wrong size. For performance 
reasons, init turns offthe normal type checking for network properties before 
calling your initialization function. So 计 you set a weight matrix to the wrong 
size, it wont immediately generate an error, but could cause problems later 
when you try to simulate or train the network. 


You can examine the implementation ofthe toolbox function initlay ifyou are 
interested in creating your own network initialization fanction. 


Layer Inifialization Funcfions 
The layer initialization function sets all the weights and biases of a layer to 
values suitable as a starting point for training or adaption. 


Once definedq, you can assign your layer initialization fanction to alayer ofa 
network. For example, the following line ofcode assigns the layer initialization 
fanction yourlif to the second layer of a network. 


net.1ayers{2}.initFcn = "YOUrlLiIf '; 
Layer initialization functiongs are only called to initialize a layer 过 the network 
initialization fanction (net.initFcn) is set to the toolbox function initlay. 工 


this is the case, then your function is used to initialize the layer whenever you 
initialize your network with init. 


net = init(net) 


To be a valid layer initialization function, it must take a network and a layer 
index i, and return the network after initializing the ;th layer. 


net = yournif(net,I) 
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Your function can then set the ;th layers weight and bias values in any way 
you see fit. However, you should be careful not to alter any other properties, or 
to set the weight matrices and bias vectors to the wrong Size. 


Ifyou are interested in creating your own layer initialization fanction, you camn 
examine the implementations ofthe toolbox functions initwb and initnw，. 


Weight and Bias Inifialization Functions 


The weight and bias initialization function sets all the weights and biases ofa 
weight or bias to values suitable as a starting point for training or adaption. 


Once defined, you can assign your initialization function to anyweight and bias 
in a network. For example, the following lines of code assign the weight and 
bias initialization fanction yourwbif to the second layers bias, and the weight 
coming from the first input to the second layer. 


net.biases{2}.initFcn = :yourwbif  ; 
net.inputWweights{2,1}.initFcn = :yourwbif  ; 


Weight and bias initialization functiongs are only called to initialize a layer 让 
the network initialization function (net.initFcn) is set to the toolbox function 
initlay,andthelayers initialization fanction (net.layers{fi}.initFcn)is set 
to the toolbox function initwb. Ifthis is the case, then your function is used to 
initialize the weight and biases it is assigned to whenever you initialize yourT 
network with init. 


net = init(net) 
Tobeavalid weight and bias initialization function,itmusttake athenumber 


of neurongs in a layer S, and a two-column matrix PR of 尺 rows defining the 
minimum and maximum values of 尺 inputs and return a new weight matrix W， 


W= rands(S,PR) 
Where: 


e S is the number of neurons in the layer. 


e PRis an 尺 x2 matrix defining the minimum and maximum values of 慌 
inputs. 


eWis anew Sx 尺 weight matrix. 


Your function also needs to generate a new bias vector as follows， 


Custom Functions 





b = rands(S) 
where: 


e S is the number ofneurons in the layer. 
e bisanewSxlbias vector. 
To see how an example custom weight and bias initialization fanction works， 
type 
help mywbif 
W = mywbif(4,[0 1; -2 2]) 
b = mywbif(4,[1 1]) 
Use this command to see how mywbif was imnplemented. 
type mywbif 


You can use mywbif as a template to create your own weight and bias 
initialization fanction. 


Learning Funcfions 


You can create four kinds of initialization fanctions: training, adaption， 
performance, and weight/bias learning. 


Training Funcftions 


One kind of general learning function is anetwork training function. Training 
fanctions repeatedly apply a set of input vectors to a network, updating the 
network each time, until some stopping criteria is met. Stopping criteria camn 
congsists of a maximum number ofepochs, a minimum error gradient, an erTor 
goal, etc. 


Once defined, you can assign your training function to a network. 
net.trainFcn = "yourtf ; 

Your network initialization function is used whenever you train your network. 
[net,tr]l = train(NET,P,T,Pi,Ai) 

To be avalid training function your function must take and return a networK， 


[net,tr] = yourtf(net,Pd,T1,Ali,Q,TS,VV,TV) 
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where: 


Pdis an My xNixTS cell array oftap delayed inputs. 


= 也 ach Pdfi,j,ts}y is the 及 X (D 疙 Q) delayed input matrix to the weight 
going to the ith layer 们 om the jith input attimestep ts.(Pdfi,j,tslyis an 
empty matrix [] 让 the zth layer doesnt have a weight 他 om therth input.) 


。Tlis an MV;xTS cell array of layer targets. 


=- Each Tl{fi,tsyisthe S xQ target matrix for the ;th layer. (TL{i,ts} is 
an empty matrix 计 the ;th layer doesnt have a target.) 


。Ailis an NV)xZD cell array of initial layer delay states:. 


= 了 ach Ai{1,kl is the S :x Q delayed ith layer output for time step ts = 
Kk- 忆 D, where ts goes 位 om 0 to 二 D-1. 


e Qis the number of concurrent vectors, 
e TS is the number oftime steps， 


eVVandTV are optional structures defining validation andtest vectors in the 
same form as the training vectors defined above: Pd, T1, Ai, Q, and TS. Note 
that the validation and testing Q and TS values can be different 位 om each 
other and 位 om those used by the training vectors， 


The dimensions above have the following definitions: 


。 Ni is the number ofnetwork layers (net,.numLayers). 

。 Ni is the number ofnetwork inputs (net.numInputs). 

e 及/ is the size ofthejth input (net.inputs{fj}.size). 

。S' is the size ofthe ith layer (net.1layers{i}.size) 

eZLDis thenumber oflayer delays (net.numLayerDelays). 

。e 万 汉 is the number of delay lines associated with the weight going to the zth 
layer 位 om the th input (Length(net.inputWeights{fIi,j}.delays)). 


Your training function must also provide information about itself using thigs 
calling format， 


info = yourtf(code ) 
where the correct information is returned for each ofthe following string codes: 


e version' -Returns the Neural Network Toolbox version (3.0). 


e 'pdefaults' -Returns a structure of default training parameters. 
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When you set the network training function (net.trainFcn) to be your 
fanction, the network's training parameters (net.trainParam) automatically 
is set to your default structure. Those values can be altered (or not) before 
training. 


Your function can update the network's weight and bias values in any way yoU 
see fit. However, you should be careful not to alter any other properties, or to 
set the weight matrices and bias vectors to the wrong size. For performance 
reasons, trainturns offthenormal type checking for network properties before 
calling your training fuanction. So 计 you set a weight matrix to the wIrong Size， 
直 wont imnmediately generate an erTror, but will cause problems later when you 
try to simulate or adapt the networkK. 


If you are interested in creating your own training function, you can examine 
the imnplementations oftoolbox functions such as trainc and trainr.The help 
for each of these utility functions lists the input and output arguments they 
take. 


Utility Functions，Ifyou examine training functions such as trainc, traingd, and 
trainlm,notethattheyuseasetofutility functions foundin the nnet/nnutils 
directory. 


These functions are not listed in Chapter 14, the Reference chapter, because 
they may be altered in the future. However, you can use these functiongs ifyou 
are willing to take the risk that you might have to update your functions for 
future versliongs ofthe toolbox. Use helLp on each function to view the function?s 
input and output arguments. 


These two fanctiongs are useful for creating a nevw training record and 
truncating it once the final number ofepochs is known: 

e newtr - New training record with any number of optional fields. 

e。 cliptr -Clip training record to the final number of epochs. 

These three functions calculate network signals going forward, errors, and 
derivatives of performance coming back: 

e。 calca - Calculate network outputs and other signals. 

e calcerr - Calculate matrix or cell array erTors， 


e calcgrad - Calculate bias and weight performance gradients. 
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These two functions get and set anetwork's weight and bias values with single 
vectors. Being able to treat all these adjustable parameters as a single vector 
is often usefu]l for imnplementing optimization algorithms: 

e getx - Get all network weight and bias values as a Single vector. 

e Setx - Set all network weight and bias values with a single vector. 

These next three functiongs are also useful for imnplementing optimization 
fonctions. One calculates all network signals going forward, including errors 
and performance. One backpropagates to find the derivatives of performance 
as a single vector. The third function backpropagates to find the Jacobian of 
performance. This latter fanction is used by advanced optimization techniques 
like Levenberg-Marquardt: 

e calcperf - Calculate network outputs, signals, and performance, 

e calcgx - Calculate weight and bias performance gradient as a single vector. 
se calcjx- Calculate weight and bias performance Jacobian as a single matrix. 


Adapt Funcfions 


The other kind ofthe general learning function is a network adapt fanction. 
Adapt functiongs simulate a network, while updating the network for each time 
step of the input before continuing the simulation to the next input. 


Once defined, you can assign your adapt function to a networK. 


net.adaptFcn = "youraf  ; 


Your network initialization function is used whenever youadapt your network. 
[net,Y,E,Pf,Af] = adapt(NET,P,T,PI,AI) 


To be avalid adapt function, it must take and return a networkKk， 


[net,Ac,E1] = youraf(net,Pd,T1,Ali,Q,TS) 
Where: 


Pdis an MXxNixTS cell array oftap delayed inputs. 
= 了 ach Pdfi,j,tsy isthe 尺 x (Di Q) delayed input matrix to the weight 
going to the ith layer from the 7Jth input attime step ts. Note that 
(Pdfi,j,tsy is an empty matrix [] 这 the zth layer doesnt have a weight 
位 om the th input.) 
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。Tlis an Ni;xTS cell array of layer targets. 


= 了 Each Tl{fi,ts} isthe S xQ target matrix for the ith layer. Note that 
(TL{fiytsy is an empty matrix 这 the ;th layer doesnt have a target.) 


。Ai is an NV; xZD cell array of initial layer delay states. 


= Each Ai{l,k} is the S xQ delayed ith layer output for time step ts = 
k- 了 D, where ts goes from 0 to 卫 D-1. 


e Q is the number of concurrent vectors. 


e TS is the number oftime steps. 
The dimensions above have the following definitions: 


。 Ni is the number of network layers (net.numLayers). 

。 Ni is the number of network inputs (net.numInputs). 

e 尽 / is the size of the jth input (net.inputs{j}.size). 
。S' is the size ofthe ith layer (net.1layers{i}.size) 
e7LDisthe number oflayer delays (net.numLayerDelays). 


。e 疡 六 is the number of delay lines associated with the weight going to the zith 
layer 位 om the th input (length(net.inputWeights{fi,j}.delays)). 


Your adapt function must also provide information about itself using thigs 
calling format， 


info = youraf (code) 
where the correct information is returned for each ofthe following string codes: 


e 'version' - Returns the Neural Network Toolbox version (3.0). 
ee 'pdefaults' -Returns a Structure of default adapt parameters. 


When you set the network adapt function (net.adaptFcn) to be your function， 
the network's adapt parameters (net.adaptParam) automatically is set to yourT 
default structure. Those values can then be altered (or not) before adapting、. 


Your function can update the network's weight and bias values in any way you 
see fit. However, you should be careful not to alter any other properties, or to 
set the weight matrices and bias vectors of the wrong size. For performance 
reasons, adapt turns offthenormal type checking for network properties before 
calling your adapt function. So 二 you set a weight matrix to the wrong size, 计 
wont imnmediately generate an error, but will cause problems later when you 
try to simulate or train the network. 
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Iyou are interested in creating your own training function, you can examine 
the implementation of a toolbox function such as trains， 


Utility Functions， If you examine the toolbox's only adqapt function trains, note 
thatituses asgset ofutility fanctions found in the nnet/nnutils directory. The 
help for each of these utility functions lists the input and output arguments 
they take. 


These fanctions are not listed in Chapter 14 because they may be altered in the 
fature. However, you can use these functions 让 you are willing to take the risk 
that you will have to update your functions for future versions of the toolbox. 


These two functions are useful for simulating a network, and calculating its 
derivatives of performance: 


e calcal -New training record with any number of optional fieldqs. 
e calce1l - Clip training record to the final number ofepochs. 
e。 calcgrad - Calculate bias and weight performance gradients. 


Performance Functions 


Performance functions allow a network's behavior to be graded. This is useful 
for many algorithms, such as backpropagation, which operate by adjusting 
network weights and biases to improve performance. 


Once defined you can assign your training function to a network. 


net.performFcn = ' yourpf '; 


Your network initialization function will then be used whenever you train youT 
adapt your network. 


[net,tr] = train(NET,P,T,Pi,Ai) 
[net,Y,E,Pf,Af] = adapt(NET,P,T,Pi,Ai) 


To be a valid performance function your function must be called as follows， 
perf = yourpf(E,X,PP) 
Where: 


。Eis either anSxQmatrix or an Ni xTS cell array of layer errors. 


= 卫 ach E{i,ts}yisthe S xQ target matrix for the zth layer. (TI(i,ts) is an 
empty matrix 让 the zth layer doesnt have a target.) 


Custom Functions 





e XisanMx 1vector ofallthe network's weights and biases, 
e PP is a structure ofnetwork performance parameters. 
I 了 is a cell array you can convert it to a matrix as follows. 


E = Cell2mat(E) ; 

Alternatively, your function must also be able to be called as follows， 
perf = yourpf(E,net) 

where you can get X and PP (这 needed) as follows. 


X = getx(net); 
PP = net.performParam; 


Your performance function must also provide information about itself using 
this calling format， 

info = yourpf(code) 
where the correct information is returned for each ofthe following string codes: 


e 'version' - Returns the Neural Network Toolbox version (3.0). 

e 'deriv' - Returns the name ofthe associated derivative fonction. 

ee 'pdefaults' - Returns a structure of default performance parameters. 
When you set the network performance function (net.performFcn) to be yourT 
fanction, the network's adapt parameters (net.performParam) will 


automatically get set to your default structure. Those values can then be 
altered or not before training or adaption. 


To see how an example custom performance function works type in these lines 
of code. 


help mypf 

e= rand(4,5) 

X= rand(12,1); 

pp = mypf('pdefaults ' ) 
perf = mypf(e,x,pp) 


Use this command to see how mypf was imnplemented. 


type mypf 
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You can use mypf as a template to create your own weight and bias 
initialization fanction. 


Performance Derivafive Functions， If you want to use backpropagation with your 
performance function, you need to create a custom derivative fanction for t. 
needs to calculate the derivative ofthe network's errors and combined weight 
and bias vector, with respect to performance， 


dPerf dE = dmsereg(` e' ，E,X,perf,PP) 
dPerf dx = dmsereg( x'，E,X,perf,PP) 


where: 


eEis an NixTS cell array of ljayer erTrors. 


= 卫 ach E{i,ts}yisthe :>x Q target matrix for the ith layer. Note that 
(TL(i,ts) is an empty matrix 这 the ith layer doesnt have a target.) 


e Xisan Mxlvector ofallthe network's weights and biases. 
e PP is a Structure ofnetwork performance parameters. 
。dPerf_dE is the Mi xTS cell array of derivatives wPerf/QE. 


= 了 Each Efiitsly is the S xQ derivative matrix for the ith layer. Note that 
(TL(i,ts) is an empty matrix 让 the zth layer doesnt have a target.) 


e dPerf dx is the Mx1l derivative QPerf/CX. 
To see how the example custom performance derivative function mydpf works， 
type 
heJp mydpf 
e = {eji 
dperf de = mydpf('e' ,exiperf,pp) 
dperf dx = mydpf('x' ,exiperf,pp) 
Use this command to see how mydpf was imnplemented. 
type mydpf 


You can use mydpf as a template to create your own performance derivative 
fonctions. 


Custom Functions 





Weight and Bias Learning Funcfions 


The most Specific kind of learning function is a weight and bias learning 
fanction. These functions are used to update individual weights and biases 
during learning. with some training and adapt functions， 


Once defined. you can assign your learning function to any weight and bias in 
anetwork. For exzample, the following lines of code assign the weight and bias 
learning function yourwb1lf to the second layers bias, and the weight coming 
位 om the first input to the second layer. 


net.biases{2}.1LearnFcn = yourwbJlf ; 
net.inputWeights{2,1}.lLJearnFcn = yourwblf ; 


Weight and bias learning functions are only called to update weights and 
biases ifthenetwork training fanction (net.trainFcn)is setto trainb,trainc， 
or trainr, or 这 the network adapt fanction (net.adaptFcn) is set to trains. 工 
this is the case, then your function is used to update the weight and biases it is 
assigned to whenever you train or adapt your network with train or adapt. 


[net,tr] = train(NET,P,T,Pi,Ai) 
[net,Y,E,Pf,Af] = adapt(NET,P,T,Pi,Ai) 


To be a valid weight and bias learning function, it must be callable as follows， 
[dW,LS] = yourwblf(W,P,Z,N,A,T,E,gW,gA,D,LP,LS) 
Where: 


eWis an Sx 玉 weight matrix. 

epPisan 尺 xQ matrzx ofQ input (column) vectors. 
ezisangSxQ matrx ofQ weighted input (column) vectors. 
eNisangSxQ matrxofQnet input (column) vectors. 
eAisangSxQ matrx ofQ layer output (column) vectors， 
eTisangSxQ matrx of Qtarget (column) vectors, 
eEisangSxQ matrxofQ error (column) vectors. 

e gWis an SxR gradient ofW with respect to performance， 
egAigangSxQ gradqient ofA with respect to performance. 
eDisangSxs matrx ofneuron distances， 


eLPis aastructure oflearning parameters, 
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e LSis astructure ofthe learning statethatisupdated for each call.(Useanull 
matrix [ ] the first time.) 


e dW is the resulting 9 x 玉 weight change matrix， 


Your function is called as follows to update bias vector 


[db,LS] = yourwblf(b,ones(1,Q),2Z,N,A,T,E,gW,gA;,D,LP,LS) 
where: 


e S is the number ofneurons in the layer. 

e bis anewgSx1l1 bias vector. 

Your learning function must also provide information about itself using thig 
calling format， 


info = yourwblf(code) 
where the correct information is returned for each ofthe following string codes: 


e 'version' -Returns the Neural Network Toolbox version (3.0). 
e 'deriv' -Returns the name ofthe associated derivative function. 
e 'pdefaults' - Returns a structure of default performance parameters. 
To see how an example custom weight and bias initialization fauanction works， 
type 
heJp mywbj]f 


Use this command to see how mywbif was imnplemented. 
type mywb1lf 


You can use mywb1lf asgs a template to create your own weight and bias learning 
fonction. 


self-Organizing Map Funcfions 


There are two kinds of fanctions that control how neurongs in selforganizing 
maps respond. They are topology and distance functions. 


Topology Functions 


Topology functions calculate the pogsitions of a layers neurons given its 
dimensions， 


Custom Functions 





Once defined, you can assign your topology fonction to any layer of a networkK. 
For example, the following line of code assigns the topology function yourtopf 
to the second layer of a networkK. 


net.1ayers{2}.topologyFcn = "yourtopf ' ; 


Your topology function is used whenever your network is trained or adapts， 
[net,tr] = train(NET,P,T,Pi,Ai) 
[net,Y,E,Pf,Af] = adapt(NET,P,T,PI,AI) 
To be avalid topology function your function must calculate positions pos 位 om 
dimensiongs dim as follows， 
pos = yourtopf(dim1 ,dim2,... ,dimN) 
Where: 
e dimi is the number ofneurons along the ith dimension ofthe layer. 


e pos js an NWxgS matrix ofS position vectors, where S is the total number of 
neurons that is defined by the product dim1*dim1*...*dimN. 


The toolbox contains an example custom topology function called mytopf. Enter 
the following lines of code to see hovw it is used. 

help mytopf 

pos = mytopf(20,20) ; 

plLotsom(pos ) 


Ifyou type that code, you get the following plot. 
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Neuron Positions 








position(2,i) 











position(1 ,) 


Enter the following command to see how mytf is imnplemented. 
type mytopf 


You can use mytopf as a template to create your own topology function. 


Distance Funcfions 


Distance functions calculate the distances of a layer's neurons given their 
position , 


Once defined, you can assign your distance function to any layer of a networkK. 
For example,the following line ofcode assigns the topology function yourdistf 
to the second layer of a network. 


net.1ayers{2}.dqistanceFcn = 'yourdistf ' ; 


Your distance function is used whenever your network is trained or adapts. 


[net,tr] = train(NET,P,T,Pi,Ai) 
[net,Y,E,Pf,Af] = adapt(NET,P,T,Pi,Ai) 
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To be avalid distance function, it must calculate distances d 位 om position pos 
as follows， 


pos = yourtopf(dim1 ,dim2,... ,dimN) 
where: 


e pos js an WxgS matrix ofS neuron position vectors. 


edisangsSxqgS matrx ofneuron distances. 


The toolbox contains an example custom distance function called mydistf. 
了 Enter the following lines of code to see how it is used. 


help mydistf 
pos = gridtop(4;,5) ; 
d=mydistf(pos) 


了 Enter the following command to see how mytf is imnplemented. 
type mydistff 


You can use mydistf as a template to create your own distance function. 
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Network Properties 


The properties define the basic features of a network. A later section， 
“Subobject Properties”describes properties that define network details. 


Archifecture 


These properties determine the number ofnetwork subobjects (which include 
inputs, layers, outputs, targets, biases, and weights), and how they are 
connected. 


numlnputs 
This property defines the number ofinputs a network receives. 


net.numInputs 
It can be set to 0 or a positive integer. 


Clarification， The number ofnetwork inputs and the size of a network input are 
7ot the same thing. The number ofinputs defines how many sets of vectors the 
Detwork receives as input. The size of each input (i.e. the number of elements 
in each input vector) is determined by the input size (net.inputs{i}.size). 


Most networks have only one input, whose size is determined by the problem. 
Side Effects，Any change to this property results in a change in the size ofthe 


matrix defining connections to layers 位 om inputs, (net.inputConnect) andthe 
Size of the cell array of input subobjects (net.inputs). 


numLayers 
This property defines the number of layers a network has， 


net .numLayers 
It can be set to 0 or a positive integer. 
Side Effects， Any change to this property changes the size of each ofthese 
Boolean matrices that define connectiongs to and 人 from layers， 


net.biasConnect 
net. InputConnect 
net.1ayerConnect 


Network Properies 





net.outputConnect 
net.targetConnect 


and changes the size each cell array ofsubobject structures whose Size depends 
on the number of layers， 


net.biases 
net. inputWeights 
net.1ayerWeights 
net.outputs 
net.targets 


and also changes the size of each of the network's adjustable parameters 
properties. 


net .IW 
net .LW 
net.b 


biasConnect 
This property defines which layers have biases. 


net.biasConnect 
It can be set to any N-by-1 matrix of Boolean values, where Ni is the number 


of network layers (net.numLayers). The presence (or absence) of a bias to the 
ith layer is indqicated by a 1 (or 0) at: 


net.biasConnect ( 工 ) 
Side Effects， Any change to this property alters the presence or absence of 


Structures in the cell array of biases (net.biases) and, in the presence or 
absence of vectors in the cell array, of bias vectors (net.b). 


inputConnect 
This property defines which layers have weights coming 位 om inputs. 


net.inputConnect 
It can be setto any MV xi matrix of Boolean values, where Vi is thenumber 
of network layers (net.numLayers), and Ni is the number ofnetwork inputs 


(net.numInputs). The presence (or absence) of a weight going to the zth layer 
位 om theth input is indicated by a 1 (or 0) at: 
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net.inputCconnect (1 j) 


S$ide Effects， Any change to this property will alter the presence or absence of 
structures in the cell array ofinput weight subobjects (net.inputWeights) and 
in the presence or absence ofmatrices in the cell array ofinput weight matrices 
(net .IW). 


layerConnect 
This property defines which layers have weights coming 位 om other layers. 


net.1ayerConnect 


LIt can be setto any Ni xi matrix ofBoolean values, where Ni is the number 
of network layers (net ,numLayers). The presence (or absence) ofa weight going 
to the ;th layer from the jJth layer is indicated byal(or 0) at: 


net.1ayerConnect (1I,]j) 
Side Effects， Any change to this property will alter the presence or absence of 
structures in the cell array of layer weight subobjects (net. LayerWeights) and 


in the presence or absence ofmatrices in the cell array oflayer weight matrices 
(net .LW). 


outputConnecf 
This property defines which layers generate network outputs. 


net.outputConnect 


It can be set to any 1xi matrix of Boolean values, where Ni is the number 
of network layers (net.numLayers). The presence (or absence) of a networK 
output 他 om the zth layer is indicated by al(or 0) at: 


net.outputCconnect (1I) 
Side Effects， Any change to this property will alter the number of network 


outputs (net.num0outputs) and the presence or absence of structures in the cell 
array of output subobjects (net.outputs). 


targetConnecf 
This property defines which layers have associated targets. 


net.targetConnect 


Network Properies 





It can be set to any 1xN; matrix of Boolean values, where Ni is the number 
of network layers (net.numLayers). The presence (or absence) of a target 
associated with the zith layer is indicated by a 1 (or 0) at: 


net.targetConnect (1I) 
S$ide Effects， Any change to this property alters the number of network targets 


(net.numTargets) andthe presence or absence ofstructures in the cell array of 
target subobjects (net .targets). 


numOutputs (read-only) 
This property indicates how many outputs the network has. 


net.numoutputs 


It is always set to the number of 1s in the matrix of output connections. 


numoutputs = Sum(net.outputConnect ) 


numTargets (read-only) 
This property indicates hovw many targets the network has. 


net .numTargets 


It is always set to the number of 1s in the matrix oftarget connections. 


numTargets = Sum(net.targetConnect ) 


numlnputDelays (read-only) 


This property indicates the number of time steps of past inputs that must be 
Supplied to simulate the network. 


net.numInputDelays 


It is always set to the maximum delay value associated any of the network”s 
input weights. 


numInputDelays = 0; 
for iI=1:net.numLayers 
for j=1:net.numInputs 
if net.inputConnect ( 工 ] ) 
numInputDelays = max (人 
[numInputDelays net.inputWeights{Ii,j}.delays]); 
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end 
end 
end 


numLayerDelays (read-only) 
This property indicates the number oftime steps ofpast layer outputs that 
must be supplied to simulate the network. 


net.numLayerDeJlays 


It is always set to the maximum delay value associated any of the network”S 
layer weights. 


numLayerDelays = 0; 
for i=1:net.numLayers 
for j=1:net.numLayers 
if net.1JayerConnect (II ] ) 
numLayerDelays = max( 
[numLayerDelays net.1ayerWeights{i, jy+.dqelays]); 
end 
end 
end 


Subobiect Structures 


These properties consist of cell arrays of structures that define each ofthe 
network's inputs, layers, outputs, targets, biases, and weights. 


The properties for each kind of subobject are described in the“Subobject 
Properties”section that follows this“Network Properties”section . 


inputs 
This property holdqs structures of properties for each of the network's inputs. 


net.inputs 


Lis always an Ni xl cell array ofinput structures, where Ni is the number 
of network inputs (net.numInputs). 


The structure defining the properties of the zth network input is located at: 


net.inputs{I} 


Network Properies 





Input Properties，See“Inputs”in the“Subobject Properties”section for 
descriptions of input properties， 


layers 
This property holdqs structures of properties for each ofthe network's layers, 


net. ayers 


Itis always an NM xl cell array oflayer structures, where Ni is thenumber of 
network layers (net.numLayers). 


The structure defining the properties of the zth layer is located at: 
net.1ayers{Il 


Layer Properfies， See“Layers”in the“Subobject Properties”section for 
descriptions of layer properties. 


ouftputs 
This property holdqs structures of properties for each of the network's outputs. 


net.outputs 


It is always an 1xNi cell array, where Ni is the number ofnetwork layers 
(net .numLayers). 


The structure defining the properties ofthe output 位 om the ;th layer (or anull 
matrix []) is located at 


net.outputs{i} 
这 the corresponding output connection is 1 (or 0). 


net.outputCconnect (1I) 


Output Properties，See“Outputs”in the “Subobject Properties”section for 
descriptions of output properties. 


targefs 
This property holdqs structures of properties for each ofthe network's targets，. 


net.targets 


13-7 


13 Nework Object Reference 





13-8 


Lis always an 1xi cell array, where Vi is the number of network layers 
(net .numLayers). 


The structure defining the properties of the target associated with the zth 
layer (or a null matrix []) is located at 


net.targetSs{i} 
过 the corresponding target connection is 1 (or 0). 
net.targetConnect (1I) 


Target Properfies，See“Targets”in the “Subobject Properties”section for 
descriptions of target properties. 


biases 
This property holdqs structures of properties for each of the network's biases. 


net.biases 


Lis always an Ni; xl cell array, where Ni is the number of network layers 
(net .numLayers). 


The structure defining the properties ofthe bias associated with the zth layer 
(or anull matrix []) is located at 


net.biases{I} 
过 the corresponding bias connection is 1 (or 0). 
net.biasConnect ( 工 ) 


Bias Properfies，See“Biases”in the “Subobject Properties”section for 
descriptions of bias properties. 


inpufWeights 
This property holdqs structures of properties for each of the network's input 
WwWeights. 


net. InputWeights 


Lis always an N) xNi cell array, where Mi is the number of network layers 
net ,numLayers), and Ni is the number of network inputs (net.numInputSs). 


Network Properies 





The structure defining the properties ofthe weight going to the ;th layer 们 om 
theth input (or anull matrix []) is located at 


net.inputWeights{I ,jl} 
过 the corresponding input connection is 1 (or 0). 
net.inputConnect(1ij) 


Input Weight Properfies，See“Input Weights”in the“Subobject Properties”section 
for descriptions of input weight properties. 


layerWeights 
This property holdqs structures of properties for each ofthe network's layer 
WwWeights. 


net.1ayerWeights 


It is always an Ni xNi cell array, where Ni is the number of network layers 
(net.numLayers). 


The structure defining the properties of the weight going to the zth layer 人 fom 
theth layer (or anull matrix []) is located at: 


net.1ayerWeights{I ,jl} 


了 the corresponding layer connection is 1 (or 0). 


net.1ayerConnect (1Ij) 


Layer Weight Properfies，See“Layer Weights”in the “Subobject Properties” 
section for descriptions of layer weight properties. 


Funcfions 


These properties define the algorithms to use when anetwork is to adapt, is to 
be initialized, is to have its performance measured, or is to be trained. 


adaptFcn 
This property defines the function to be used when the network adapts. 


net.adaptFcn 
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It can be set to the name of any network adapt function, includqing this toolbox 
fonction: 


trains - By-weight-and-bias network adaption function.， 


The network adapt function is used to perform adaption whenever adapt is 
called. 


[net,Y,E,Pf,Af] = adapt(NET,P,T,Piy,Ai) 


Custom Functions.， See Chapter 12 for information on creating custom adqapt 
fonctions. 


S$ide Effects，Whenever this property is altered, the network's adaption 
parameters (net,adaptParam) are set to contain the parameters and default 
values of the nevw function 


iniftFcn 


This property defines the function used to initialize the network”s weight 
matrices and bias Vectors. 


net.initFcn 


LI can be set to the name of any network initialization function, including thigs 
toolbox function , 


initlay - Layer-by-layer network initialization function.， 


The initialization function is used to initialize the network whenever init is 
called. 


net = init(net) 


Custom Functions， See Chapter 12 for information on creating custom 
initialization fuanctions， 


S$ide Effects，Whenever this property is altered, the network's initialization 
parameters (net.initParam) are set to contain the parameters and default 
values of the nevw function 


performFcn 
This property defines the function used to measure the network's performance. 


Network Properies 





net.performFcn 


It can be set to the name of any performance fanction, includqing these toolbox 
fanctions. 





Performance Functiions 





mae Mean absolute error-performance function 

mse Mean squared error-performance function 
msereg Mean squared erTror w/reg performance fanction. 
SSse Sum squared error-performance function. 





The performance function is used to calculate network performance during 
training whenever train is called. 


[net,tr] = train(NET,P,T,Pi,Ai) 


Custom funcfions. See“Advanced Topics”in Chapter 12 for information on 
creating custom performance functions， 


Side Effects，Whenever this property is altered, the network's performance 
parameters (net.performParam) are set to contain the parameters and default 
values of the new fonction. 


trainFcn 
This property defines the function used to train the network. 


net .trainFcn 


It can be set to the name of any training function, including these toolbox 
fanctions. 





Training Funcfions 





trainbfg BFGS quasi-Newton backpropagation. 
trainbr Bayesian regularization. 


traincgb Powell-Beale conjugate gradient backpropagation. 
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Training Funcfions 





traincgf Fletcher-Powell conjugate gradient backpropagation . 


traincgp Polak-Ribiere conjugate gradient backpropagation. 


traingd Gradient descent backpropagation. 

traingda Gradient descent with adaptive lr backpropagation, 
traingdm Gradient descent with momentum backpropagation. 
traingdx Gradient descent with momentum and adqaptive lr backprop. 
trainlm Levenberg-Marquardt backpropagation. 

trainoss One-step secant backpropagation 

trainrp Resilient backpropagation (有 了 prop). 

trainscg Scaled conjugate gradient backpropagation. 

trainb Batch training with weight and bias learning rules， 

trainc Cyclical order incremental training with learning functions. 
trainr Random order incremental training with learning fanctions. 





The training function is used to train the network whenever train is called. 


[net,tr] = train(NET,P,T,Pi,Ai) 


Custom Functions， See“Advanced Topics”in Chapter 12 for information on 
creating custom training functions. 


S$ide Effects，Whenever this property is altered, the network's training 


parameters (net,trainParam) are set to contain the parameters and default 
values of the nevw function 


Parameters 


adaptParam 
This property defines the parameters and values ofthe current adapt function. 
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net.adaptParam 


The fields ofthis property depend on the current adapt function 
(net.adaptFcn). 了 Evaluate the above reference to see the fieldqs of the current 
adapt function. 


Call help on the current adapt function to get a description of what each field 
means， 


help(net.adaptFcn) 


initParam 


This property defines the parameters and values of the current initialization 
fanction. 


net .InitParam 


The fields of this property depend on the current initialization function 
(net ,initFcn). 了 valuate the above reference to see the fieldqs ofthe current 
initialization fanction. 


Call help on the current initialization function to get a description ofwhat each 
field means. 


help(net.initFcn) 


performParam 


This property defines the parameters and values of the current performance 
fanction. 


net.performParam 


The fieldqs ofthis property depend on the current performance function 
(net.performFcn).Evaluate the above reference to see the fieldqs ofthe current 
performance function. 


Call help on the current performance function to get a description ofwhat each 
和 eld means, 


help(net.performFcn) 
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trainParam 


This property defines the parameters and values ofthe current training 
fonction. 


net.trainParam 


The fieldqs of this property depend on the current training function 
(Cnet.trainFcn). 了 valuate the above reference to see the fieldqs of the current 
training fanction. 


Call help on the current training function to get a description ofwhat each field 
means. 


help(net.trainFcn) 


Weight and Bias Values 


These properties define the network's adjustable parameters: its weight 
matrices and bias vectors. 


IW 


This property defines the weight matrices of weights going to layers 位 om 
network inputs. 


ne 七 .IW 


Lis always an NixNi cell array, where Ni is the number of network layers 
(et ,numLayers), and Ni is the number of network inputs (net.numInputSs). 


The weight matrix for the weight going to the ith layer from thejth input (or a 
null matrix []) is located at 


net .IW{i,j} 
过 the corresponding input connection is 1 (or 0). 


net.inputCconnect (1I,j) 


The weight matrix has as many rowsgs as the size ofthe layer it goes to 
(net.layers{fi}.size).Ithas as many columns as the product ofthe input size 
with the number of delays associated with the weight. 


net.inputs{j}.size * Length(net.inputWeights{i,j}.delays) 


These dimensions can also be obtained from the input weight properties. 


Network Properies 





net.inputWeights{I,j}.sSize 


LW 
This property defines the weight matrices ofweights going to layers 他 om other 
]ayers. 


net .LW 


It is always an NM xXNi cell array, where Ni is the number of network layers 
(net.numLayers). 


The weight matrix for the weight going to the zth layer fom theth layer (or a 
null matrix []) is located at 


net .LW{i，j} 
过 the corresponding layer connection is 1 (or 0). 


net.1ayerConnect (1I,]j) 


The weight matrix has as many rowsgs ags the size ofthe layer 直 goes to 
(net.layers{fi}.size).Ithas asmany columns as the product ofthe size ofthe 
layer it comes 位 om with the number of delays associated with the weight. 


net.1ayers{j}.size * Length(net. JayerWeights{Ii,j}.delays) 
These dimensions can also be obtained fom the layer weight properties. 


net.1ayerWeights{I,j}.sSize 


b 


This property defines the bias vectors for each layer with a bias. 


net.b 


It is always an My xl cell array, where Ni is the number of network layers 
(net.numLayers). 


The bias vector for the zth layer (or anull matrix [ ]) is located at 
net.b{il} 
了 the corresponding bias connection is 1 (or 0). 


net.biasConnect ( 工 ) 
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The number ofelements in the bias vector is always equal to the size ofthe 
layer it is associated with (net.layers{i}.size). 


This dimension can also be obtained from the bias properties， 


net.biases{Ii}.Size 


Ofiher 


The only other property is a user data property. 


userdata 


This property provides aplace for users to add custom information to anetwork 
object.， 


net.USserdata 


Only one field is predefined. It contaings a secret message to all Neural Network 
Toolbox users. 


net.USerdata.note 


Subobiect Properiies 





Subobiect Properfties 


These properties define the details of a network's inputs, layers, outputs， 
targets, biases, and weights. 


InpPuts 
These properties define the details of each zth network input. 


net.inputs{I} 


range 

This property defines the ranges of each element of the ;th network input， 
net.inputs{Iz}y.range 

It can be set to any 丸 ; X2 matrix, where 刃 ; ls the number ofelements in the 


input (net.inputs{fi}.size), and each element in column 1lis less than the 
element next to it in column 2. 


卫 ach th row defines the minimum and maximum values oftheth input 
element, in that order: 


net.inputs{Iz}y(j,:) 


Uses， Some initialization functions use input ranges to find appropriate initial 
values for input weight matrices. 


Side Effects，Whenever the number of rowsgs in this property is altered, the 
layers”s Size (net.inputs{fi}.size) changes to remain consistent. The size of 
any weights coming from this input (net,inputWeights{:,i}.size) andthe 
dmensions of their weight matrices (net.IW{:,i}) also changes Size， 


5ize 
This property defines the number of elements in the ;th network inpnut. 
net.inputs{i}.sSize 


It can be set to 0 or a positive integer. 


Side Effects，Whenever this property is altered, the input's ranges 
(net.inputs{fil.ranges), any input weights (net.inputWeights{:,i}.size) 
and their weight matrices (net.IW{:,i})change size to remain consistent. 
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userdata 
This property provides a place for users to add custom information to the ;th 
network input. 


net.inputs{i}y.userdata 


Only one field is predefined. It contaings a secret message to all Neural Network 
Toolbox users. 


net.inputs{i}y.userdata.note 


Layers 
These properties define the details of each ;th network layer. 


net.1ayers{I} 


dimensions 

This property defines the pjysica! dimensions ofthe ;th layers neurons. Being 
able to arrange a layers neurons in a multidimensional manner is important 
for selforganizing maps， 


net.1ayers{i}.dimensions 


It can be set to any row vector of 0 or positive integer elements, where the 
product of all the elements will becomes the number ofneurons in the layer 
(net.1ayers{Ii}.size). 


Uses. Layer dimensions are used to calculate the neuron positions within the 
layer (net,1Layers{fi}.positions) using the layer's topology function 
(net.1ayers{i}.topologyFcn). 


S$ide Effects，Whenever this property is altered, the layers”s Size 
(net.layers{fi}.size) changes to remain consistent. The layer's neuron 
positions (net.1layers{fi}.positions) and the distances between the neurons 
(net.layers{i}.distances) are also updated. 


distanceFcn 

This property defines the function used to calculate distances between neurons 
in the ith layer (net.1Llayers{i}.distances) 位 om the neuron positiongs 
(net.1layers{fi}.positions). Neuron distances are used by self-organizing 
maps. 


Subobiect Properiies 





net.1ayers{Ii}.distanceFcn 


It can be set to the name of any distance function, including these toolbox 
fonctions， 





Disftance Funcfions 





boxdist Distance between two position Vectors， 
dist Euclidean distance weight fonction. 
Linkdist Link distance function. 

mandist Manhattan distance weight function. 





Custom Functions， See“Advanced Topics”in Chapter 12 for information on 
creating custom distance fanctions， 


Side Effects， Whenever this property is altered, the distance between the layers 
neurons (net.1Layers{fil}.distances) is updated. 


distances (read-only) 


This property defines the distances between neurons in the ;th layer. These 
distances are used by self-organizing maps. 


net. Layers{i}.distances 


It is always set to the result of applying the layer's distance function 
(net.layers{fi}.distanceFcn) to the positions ofthe layers neurons 
(net .Layers{fi}+.positions). 


initFcn 


This property defines the initialization function used to initialize the ;th layer，, 
孙 the network initialization fanction (net.initFcn) is initlay. 


net.1ayers{Ii}.initFcn 
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LI can be set to the name of any layer initialization function, including these 
toolbox functions. 





Layer Initializafion Functions 





initnw Nguyen-Widrow layer initialization function. 


initwb By-weight-and-bias layer initialization function. 





Ifthe network initialization is set to initlay, then the function indqicated by 
this property is used to initialize the layers weights and biases when init is 
called. 


net = init(net) 


Custom Functions， See Chapter 12 for information on creating custom 
initialization fanctions， 


neftlnpuftFcn 


This property defines the net input fanction use to calculate the ;th layers net 
input, given the layers weighted inputs and bias. 


net.1ayers{i}.netInputFcn 


It can be set to the name of any net input fanction, including these toolbox 
fanctions. 





Net Input Functions 





netprod Product net input function. 


netSsum Sum net input function. 





The net input function is used to simulate the network when sim is called. 


[Y,Pf,Af] = Sim(net,P,Pi,Ai) 


Custom Functions， See Chapter 12 for information on creating custom net input 
fonctions. 


Subobiecf Properiies 





positions (read-only) 
This property defines the positions ofneurons in the ith layer. These positiongs 
are Used by self-organizing maps. 

net. ayers{fi}.positions 


It is always set to the result of applying the layer's topology function 
(net.1Llayers{fi}.topologyFcn) to the positions of the layer's dimensions 
(net .Layers{fi}+.dimensions). 


Plotting， Use plotsom to plot the positions of a layers neurons， 


For instance, 过 the first layer neurongs of a network are arranged with 
dmensions (net. layers{1}.dimensions)of [45] and the topology function 
(net.1Llayers{f1}.topologyFcn) is hextop, the neuron's positions can be plotted 
as Shown below. 


plotsom(net.1Layers{1}.positions ) 


Neuron Positions 








广 
QI 
T 

1 


记 
T 
1 





二 
QI 
T 

1 


position(2,)) 














2 
position(1,) 


13-21 


13 Nework Object Reference 





size 
This property defines the number ofneurons in the zth layer. 


net.1ayers{Ii}.Size 
It can be set to 0 or a positive integer. 


Side Effects，Whenever this property is altered, the sizes of any input weights 
going to the layer (net.inputWeights{fi,:}.size), and any layer weights 
going to the layer (net.1LlayerWeights{fi,:}.size)or coming 位 om the layer 
(net.inputWeights{fi,:}.size), and the layer's bias (net.biases{i}.size) 
change， 


The dimensions ofthe corresponding weight matrices (net.IW{i，:}， 
net.LW{i,:}, net.LW{:,i}+)andbiases (net.b{fi})alsochange. 


Changing this property also changes the size ofthe layers output 
(net.outputs{fi}.size) andtarget (net.targets{fi}.size)ifthey exist. 


Finally, when this property is altered, the dimensions of the layers neurongs 
(net.1ayers{i}y.dimension) are set to the same value. (This results in a 
one-dimensional arrangement ofneurons. Ifanother arrangement is required， 
set the dimensions property directly instead of using size). 


topologyFcn 


This property defines the function used to calculate the ; 怒 layers neuron 
positions (net.1ayers{i}.positions) 位 om the layers dimensionsg 
(net.1ayers{i}.dimensions). 


net.topologyFcn 


It can be set to the name of any topology function, including these toolbox 
fonctions. 





Topology Functions 





gridtop Gridtop layer topology fonction. 
hextop Hexagonal layer topology function. 
randtop Random layer topology function. 
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Custom funcfions， See Chapter 12 for information on creating custom topology 
fonctions， 


Side Effects，Whenever this property is altered, the positions of the layers 
neurons (net.1layers{fi}+.positions) is updated. 


Plotting，Use plotsom to plot the positions of a layers neurons， 


For instance, 过 the first ljayer neurongs of a network are arranged with 
dimensions (net. layers{f1}.dimensions)of [8 10] and the topology function 
(net.1layers{f1}.topologyFcn)is randtop, the neuron's positions are arranged 
something like those shown in the plot below, 


plotsom(net.1Layers{1}.positions ) 


Neuron Positions 
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fransferFcn 


This function defines the transfer function used to calculate the ith layers 
output, given the layers net input. 


net.1ayers{I}.transferFcn 
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Lt can be set to the name of any transfer function, including these toolbox 
fonctions， 





Transfer Functions 





Compet Competitive transfer function. 

hard1lim Hard-limit transfer function. 

hardl1ims Symmetric hard-limit transfer function. 
10ogsig Log-sigmoid trangsfer fanction. 

poslin Positive linear transfer function. 

purelin Hardq-limit transfer function. 

radbas Radial basis transfer fanction， 

Sat1lLin Saturating linear transfer function. 

Sat1Lins Symmetric saturating linear transfer function. 
Softmax Soft max transfer function. 

tansig 也 yperbolic tangent sigmoid trangsfer function， 
tribas Triangular basis trangsfer function. 





The transfer function is used to simulate the network when sim is called. 
[Y,Pf,Af] = Sim(net,P,Pi,Ai) 


Custom functions， See Chapter 12 for information on creating custom transfer 
fonctions， 


userdata 


This property provides a place for users to add custom information to the ;th 
Detwork layer. 


net.1ayers{i}y.uUserdata 


Only one field is predefined. It contaings a secret message to all Neural Network 
Toolbox users. 


Subobiect Properiies 





net.1ayers{Ii}.userdata.note 


Outpufs 


Size (read-only) 
This property defines the number of elements in the ;th layers output. 


net.outputs{i}.size 


It is always set to the Size of the z 友 layer (net. layers{fi}.size). 


Userdata 


This property provides a place for users to add custom information to the zth 
layers output. 


net.outputs{i}.userdata 


Only one field is predefined. It contaings a secret message to all Neural Network 
Toolbox users， 


net.outputs{i}.userdata.note 


Targets 


Size (read-only) 
This property defines the number of elements in the zth layers target. 


net.targets{i}.sSize 


It is always set to the size of the z 太 layer (net. layers{fi}.size). 


userdata 


This property provides a place for users to add custom information to the zth 
layers target. 


net.targets{i}.userdata 


Only one field is predefined. It contains a secret message to all Neural Network 
Toolbox users， 


net.targets{i}.userdata.note 
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Biases 


initFcn 


This property defines the fanction used to initialize the ;th layer's bias Vector， 
ithe network initialization function is initlay, and the ;th layer's 
initialization function is initwb. 


net.biases{i}+.initFcn 


This function can be set to the name of any bias initialization function， 
including the toolbox fanctions. 





Bias Initializaftion Funcfions 





initcon Conscience bias initialization function. 
initzero Zero-weight/bias initialization function. 
rands Symmetric random weight/bias initialization function,， 





This function is used to calculate an initial bias vector for the ith layer 
(net.b{fi}y)when init is called, ifthe network initialization function 
(net.initFcn) is initlay, and the z 友 layers initialization function 
(net.1ayers{i}+.initFcn) is initwb. 


net = init(net) 


Custom functions， See Chapter 12 for information on creating custom 
initialization fanctions， 


learn 


This property defines whether the ith bias vector is to be altered during 
training and adaption. 


net.biases{i}+.1Learn 
It can be set to 0 or 工 . 


It enables or disables the bias' learning during calls to either adapt or train， 


[net,Y,E,Pf,Af] = adapt(NET,P,T,Pi,Ai) 
[net,tr] = train(NET,P,T,Pi,Ai) 


Subobiect Properiies 





learnFcn 


This property defines the function used to update the zth layers bias vector 
during training, ifthe network training function is trainb, trainc, or trainr， 
or during adaption, 过 the network adapt function is trains. 


net .biases{Ii}.LIearnFcn 


It can be set to the name of any bias learning function, including these toolbox 
fanctions. 





Learning Funcfions 





learncon Conscience bias learning fanction. 

Learngd Gradient descent weight/bias learning function. 

learngdm Grad. descent w/momentum weight/bias learning fanction. 
learnp Perceptron weight/bias learning function. 

Learnpn Normalized perceptron weight/bias learning function. 
1Learnwh Widrow-Hoff weight/bias learning rule，. 





The learning function updates the zj bias vector (net.bfi}y) during calls to 
train, 这 the network training function (net.trainFcn) is trainb, trainc, or 
trainr, or during calls to adapt, ifthe network adapt function (net.adaptFcn) 
ls trains. 


[net,Y,E,Pf,Af] = adapt(NET,P,T,Pi,Ai) 
[net,tr] = train(NET,P,T,Pi,Ai) 


Custom funcfions. See Chapter 12 for information on creating custom learning 
fonctions， 


Side Effects，Whenever this property is altered, the biases's learning parameters 
(net.biases{fi}.1learnParam) are setto contain thefields and default values of 
the new fanction. 


learnParam 


This property defines the learning parameters and values for the current 
learning fanction ofthe zth layer's bias. 
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net.biases{fi}+. learnParam 


The fieldqs of this property depend on the current learning function 
(net.biases{i}.LearnFcn).Evaluate the above reference to see the fieldqs of 
the current learning fanction. 


Call help on the current learning function to get a description of what each 
field means， 


help(net.biases{I}. learnFcn) 


Size (read-only) 
This property defines the size ofthe ;th layers bias vector. 


net.biases{Ii}.Size 


It is always set to the size ofthe zth layer (net.layers{i}.size). 


userdata 


This property provides a place for users to add custom information to the ;th 
layer's bias. 


net.biases{i}.userdata 


Only one field is predefined. It contaings a secret message to all Neural Network 
Toolbox users. 


net.biases{i}+.userdata.note 


Inpuf Weights 


delays 


This property defines a tapped delay line between thejth input and its weight 
to the ;th layer. 


net.inputWeights{1i,j}.delays 
LIt must be set to a row vector of increasing 0 or positive integer values. 


S$ide Effects，Whenever this property is altered, the weight's size 
(net.inputWeights{fi,j}.size) andthe dimensions ofits weight matrix 
(net.IW{i, jl}y) are updated. 


Subobiect Properiies 





initFcn 


This property defines the function used to initialize the weight matrix going to 
the zth layer 他 om the th input, 这 the network initialization fanction is 
initlay, and the ith layers initialization function is initwb. 


net. inputWeights{fI,j}.initFcn 


This function can be set to the name of any weight initialization function， 
including these toolbox functions. 





Weight Inifialization Functions 





initzero Zero-weight/bias initialization function. 

midpoint Midpoint-weight initialization function， 

randnc Normalized column-weight initialization fanction. 
randnr Normalized row-weight initialization fanction,， 

rands Symmetric random-weight/bias initialization function. 





This fanction is used to calculate an initial weight matrix for the weight going 
to the ith layer 们 om thejth input (net.IW{i,j}) when init is called, ifthe 
network initialization function (net.initFcn) is initlay, and the zth layers 
initialization fanction (net.1Layers{fi}.initFcn) is initwb. 


net = Init(net) 


Custom Functions， See Chapter 12 for information on creating custom 
initialization fuanctions. 


learn 


This property defines whether the weight matrix to the zth layer 人 fom theth 
input is to be altered during training and adaption. 


net,.inputWeights{Ii,j}.Learn 
It can be set to 0 or 二 . 


It enables or disables the weights learning during calls to either adapt or 
train. 


13-29 


13 Nework Object Reference 





[net,Y,E,Pf,Af] = adapt(NET,P,T,Piy,Ai) 
[net,tr] = train(NET,P,T,Pi,Ai) 


learnFcn 

This property defines the function used to update the weight matrix going to 
the zth layer ffom thejth input during training, ifthe network training 
fonction is trainb,trainc,ortrainr,or during adaption,ifthenetwork adapt 
fonction is trains， 


net.inputWeights{fI, jl}.lJearnFcn 


It can be set to the name of any weight learning function, including these 
toolbox functions 





Weight Learning Functions 





learngd Gradient descent weight/bias learning function. 

learngdm Grad. descent w/ momentum weight/bias learning fanction. 
learnh Hebb-weight learning function. 

Learnhd Hebb with decay weight learning function 

learnis Instar-weight learning function. 

learnk 玫 ohonen-weight learning fanction， 

learnlv1 LVQ1l-weight learning function, 


learnlv2 LVQ2-weight learning function, 


Learnos Outstar-weight learning fanction. 

learnp Perceptron weight/bias learning function. 

learnpn Normalized perceptron-weight/bias learning function. 
learnsom Self-organizing map-weight learning function. 
learnwh Widrow-Hoff weight/bias learning rule. 





The learning function updates the weight matrix of the zth layer 位 om the th 
input (net.IW{i,j}) during calls to train, 这 the network training function 
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(net.trainFcn) is trainb, trainc, or trainr, or during calls to adapt, ifthe 
network adapt function (net.adaptFcn) is trains. 


[net,Y,E,Pf,Af] = adapt(NET,P,T,Pi,Ai) 
[net,tr] = train(NET,P,T,Pi,Ai) 


Custom Functions， See Chapter 12 for information on creating custom learning 
fanctions， 


learnParam 
This property defines the learning parameters and values for the current 
learning function ofthe ;th layer's weight coming 他 om the th input. 


net. InputWeights{fI,， jl}.LIearnParam 
The fields ofthis property depend on the current learning function 


(net.inputWweights{fi,j}.LearnFcn).Evaluate the above reference to see the 
fieldqs of the current learning function. 


Call help on the current learning function to get a description of what each 
field means， 


help(net.inputWeights{fI,j}.LearnFcn) 


Size (read-only) 
This property defines the dimensiongs ofthe zth layer's weight matrix fom the 
Jth network input， 

net.inputWeights{I,，j}.size 


It is always set to a two-element row vector indicating the number ofrows and 
columns of the associated weight matrix (net.IW{i ,jly).The first element is 
equal to the size ofthe zth layer (net.layers{fi}.size).The second element is 
equal to the product of the length ofthe weights delay vectors with the size of 
thejth input: 


length(net.inputWeights{I,，j}.delays) * net.inputs{j}.size 


userdata 


This property provides a place for users to add custom information to the (7)th 
input weight. 


net.inputWeights{i,j}.userdata 
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Only one field is predefined. It contaings a secret message to all Neural Network 
Toolbox users. 


net.inputWeights{i,j}.userdata.note 


weightFcn 


This property defines the function used to apply the ;th layers weight from the 
Jth input to that input， 


net. InputWeights{fI, jl}.weightFcn 


It can be set to the name of any weight function, including these toolbox 
fonctions. 





Weight Functions 





dist Conscience bias initialization function. 

dotprod Zero-weight/bias initialization function. 

mandist Manhattan-dqistance weight function. 

negdist Normalized column-weight initialization function. 
normprod Normalized row-weight initialization function. 





The weight function is used when sim is called to simulate the network. 
[Y,Pf,Af] = Sim(net,P,Pi,Ai) 
Custom functions， See Chapter 12 for information on creating custom weight 


functions. 


Layer Weights 


delays 


This property defines a tapped delay line between thejth layer and its weight 
to the zt7 layer. 


net.1ayerWeights{i,j}.delays 


It must be set to a row vector of increasing 0 or pogsitive integer values. 


Subobiect Properiies 





initFcn 


This property defines the function used to initialize the weight matrix going to 
the zth layer from the th layer, 让 the network initialization fanction ig 
initlay, and the ith layers initialization fanction is initwb. 


net.1ayerWeights{fI,j}.initFcn 


This function can be set to the name of any weight initialization function， 
including the toolbox functions. 





Weight and Bias Inifializaftion Funcfions 





initzero Zero-weight/bias initialization function. 

midpoint Midpoint-weight initialization function， 

randnc Normalized column-weight initialization fanction. 
randnr Normalized row-weight initialization fanction,， 

rands Symmetric random-weight/bias initialization function. 





This fanction is used to calculate an initial weight matrix for the weight going 
to the zth layer 他 om thejth layer (net.LW{i,j}) when init is called, ifthe 
network initialization fuanction (net.initFcn) is initlay, and the zth layers 
initialization fanction (net.1ayers{fi}.initFcn) is initwb. 


net = Init(net) 


Custom Functions， See Chapter 12 for information on creating custom 
initialization fuanctions. 


learn 


This property defines whether the weight matrix to the 7 友 layer 位 om thejth 
layer is to be altered during training and adaption. 


net.1ayerWeights{Ii,j}. learn 
It can be set to 0 or 工 . 


It enables or disables the weights learning during calls to either adapt or 
train. 
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[net,Y,E,Pf,Af] = adapt(NET,P,T,Piy,Ai) 
[net,tr] = train(NET,P,T,PI,Ai) 


learnFcn 

This property defines the function used to update the weight matrix going to 
the zth layer fom thejth layer duringtraining,ifthe network training function 
is trainb,trainc, or trainr,or during adaption, ifthe network adapt fanction 
ls trains. 


net.1ayerWeights{fI, jl}.lJearnFcn 


It can be set to the name of any weight learning function, includqing these 
toolbox functions 





Learning Funcfions 





Learngd Gradient-descent weight/bias learning function, 

learngdm Grad. descent w/momentum weight/bias learning fanction. 
learnh Hebb-weight learning function. 

Learnhd Hebb with decay weight learning function 

learnis Instar-weight learning function. 

learnk 玫 ohonen-weight learning fanction， 

learnlv1 LVQ1l-weight learning function, 


learnlv2 LVQ2-weight learning function, 


Learnos Outstar-weight learning fanction. 

learnp Perceptron-weight/bias learning function. 

learnpn Normalized perceptron-weight/bias learning function. 
learnsom Self-organizing map-weight learning function. 
Learnwh Widrow-Hoff weight/bias learning rule. 





The learning function updates the weight matrix of the zth layer form the th 
layer (net.LW{i,j}) during calls to train, ifthe network training fanction 
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(net.trainFcn) is trainb, trainc, or trainr, or during calls to adapt, ifthe 
network adapt function (net.adaptFcn) is trains. 


[net,Y,E,Pf,Af] = adapt(NET,P,T,Pi,Ai) 
[net,tr] = train(NET,P,T,Pi,Ai) 


Custom Functions， See Chapter 12 for information on creating custom learning 
fanctions， 


learnParam 
This property defines the learning parameters fieldqs and values for the current 
learning fanction ofthe zt layers weight coming 人 om the th layer. 


net.1ayerWeights{I,j}. LearnParam 
The subfieldqs of this property depend on the current learning function 


(net.1layerWeights{fi,j}.LearnFcn).Evaluate the above reference to see the 
fieldqs of the current learning function. 


Get help on the current learning function to get a description ofwhat each field 
means， 


help(net. JayerWeights{fI,j}.LearnFcn) 


Size (read-only) 
This property defines the dimensions ofthe z 友 layers weight matrix 他 om the 
Jth layer. 


net.1ayerWeights{I,j}.size 
It is always set to a two-element row vector indicating the number ofrows and 
columns of the associated weight matrix (net.LW{i ,jly).The first element is 
equal to the size ofthe zt layer (net.1ayers{i}.size). The second element is 


equal to the product of the length ofthe weights delay vectors with the size of 
the th layer. 


Length(net.1LayerWeights{I,，j}.delays) * net.1Layers{j}.size 


userdata 


This property provides a place for users to add custom information to the (27 思 专 
layer weight. 


net.1ayerWeights{i,j}.userdata 
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Only one field is predefined. It contaings a secret message to all Neural Network 
Toolbox users. 


net.1ayerWeights{i,j}.userdata.note 


weightFcn 


This property defines the function usedto apply the ;th layers weight 位 om the 
Jth layer to that layers output. 


net.1ayerWeights{fI, jl}.weightFcn 


It can be set to the name of any weight function, including these toolbox 
fonctions. 





Weight Functions 





dist Euclidean-distance weight fanction. 
dotprod Dot-product weight fanction. 

mandist Manhattan-distance weight function. 
negdist Dot-product weight fanction. 

normprod Normalized dot-product weight fanction ， 





The weight function is used when sim is called to simulate the network. 


[Y,Pf,Af] = Sim(net,P,Pi,Ai) 


Custom Functions， See Chapter 12 for information on creating custom weight 
fonctions， 


Reference 
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14-13 
14-14 


14-19 
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Functions 一 By Category 


Functions by Network Type 





Function and Network Type 





assoclr Associative learning rules 
backprop Backpropagation networks 
elman 卫 ]man recurrent networks 
hopfield 了 opfield recurrent networks 
Linnet Linear networks 

Lvdq Learning vector quantization 
percept Perceptrons 

radbasis Radial basis networks 
Selforg Self-organizing networks 





Functions by class 





Analysis Funcftions 





errSsurf Error Surface of a single input neuron. 


maXx1Linlr Maximum learning rate for a linear neuron. 
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Disftance Funcfions 





boxdist Distance between two position vectors， 
dist Euclidean distance weight function. 
1Linkdist Link distance function. 

mandist Manhphattan distance weight function. 








Graphical Interface Function 





nntoo1 Neural Network Tool - Graphical User Interface. 








Layer Initializafion Functions 





initnw Nguyen-Widrow layer initialization function. 


initwb By-weight-and-bias layer initialization function. 








Learning Funcfions 





learncon Conscience bias learning fanction. 

Learngd Gradient descent weight/bias learning function. 

learngdm Grad. descent w/momentum weight/bias learning fanction. 
learnh Hebb weight learning function. 

learnhd Hebb with decay weight learning rule. 

learnis Instar weight learning function 
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Learning Funcfions 





1earnk 
Jearnlv1 
1earn1lLVv2 
learnos 
1earnp 
Jearnpn 
earnsom 


1Iearnwh 


Kohonen weight learning function. 


LVQL1 weight learning function. 


LVQ2 weight learning function. 


Outstar weight learning function. 


Perceptron weight and bias learning function. 


Normalized perceptron weight and bias learning function, 


Self-organizing map weight learning fanction. 


Widrow-Hoff weight and bias learning rule. 








Line Search Funcfiions 





Srchbac 
Srchbre 
Srchcha 
Srchgol 


srchhyb 


One-dim. 
One-dim. 
One-dim. 
One-dim. 


One-dim. 


minimization using backtracking search. 
interval location using Brent's method. 
minimization using Charalambous' method. 
minimization using Golden section search, 


minimization using Hybrid bisection/cubic search. 








Net Input Derivative Functions 





dnetprod 


dnetsum 


Product net input derivative fanction, 


Sum net input derivative fanction. 
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Net Input Functions 





netprod Product net input function. 


netsum Sum net input function. 








Network Initializafion Funcftions 





init1lay Layer-by-layer network initialization fanction,， 








Neftwork Use Funcfiions 





adapt Allow a neural network to adapt. 

disp Display a neural network's properties. 

display Display a neural network variables name and properties，. 
init Initialize a neural network. 

Sinm Simulate a neural networkK. 

train Train a neural network， 








New Networks Functions 





network Create a custom neural network. 

newc Create a competitive layer. 

newcf Create a cascade-forward backpropagation network. 
newelm Create an Elman backpropagation network. 
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New Networks Functions 





newff Create a feed-forward backpropagation network. 
newfftd Create a feed-forward input-delay backprop networK. 
newgrnn Design a generalized regression neural network. 
newhop Create a Hopfield recurrent network. 

newlin Create a linear layer. 

newlind Design a linear layer. 

newlvq Create a learning vector quantization network 
newp Create a perceptron. 

newpnn Design a probabilistic neural network. 

newrb Design a radial basis network. 

newrbe Design an exact radial basis network. 

newSsom Create a self-organizing map， 








Performance Derivaftive Functions 





dmae Mean absolute error performance derivative function 

dmse Mean squared error performance derivatives function. 
dmsereg Mean squared error w/reg performance derivative function 
dsse Sum squared error performance derivative fanction， 
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Performance Functiions 





mae 
mse 
msereg 


SSe 


Mean absolute error performance function. 
Mean squared error performance function. 
Mean squared erTror w/reg performance fanction. 


Sum squared error performance fanction. 








Plotting Funcfions 





hintonw 
hintonwb 
plLotbr 
plotep 
plotes 
plotpc 
plLotperf 
plLotpv 
plLotsom 
plLotv 


plLotvec 


Hinton graph of weight matrix. 


Hinton graph of weight matrix and bias vector. 


Plot network perf. for Bayesian regularization training. 


Plot weight and bias position on erTor Surface. 
Plot error surface of single input neuron. 

Plot classification line on perceptron vector plot. 
Plot network performance，. 

Plot perceptron input target vectors. 

Plot selforganizing map. 

Plot vectors as lines fom the origin. 


Plot vectors with different colors. 
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Pre and Post Processing Functions 





postmnmx Unnormalize data which has been norm. by prenmmx. 
postreg Postprocess network response w. linear regression analysis. 
poststd Unnormalize data which has been normalized by prestd. 
premnmx Normalize data for maximum of 1 and minimum of - 工 . 
prepca Principal component analysis on input data. 

prestd Normalize data for unity standard deviation and zero mean. 
tramnmx Transform data with precalculated minimum and max. 
trapca Transform data with PCA matrix computed by prepca. 
trastd Transform data with precalc. mean & standard deviation. 








Simulink Support Funcfion 





gensinm Generate a Simulink block for neural network Simulation. 








Topology Functions 





gridtop Gridtop layer topology fonction. 
hextop Hexagonal layer topology function. 
randtop Random layer topology function. 





14-8 


Functions 一 By Category 








Training Funcfions 





trainb 
trainbfg 
trainbr 
trainc 
traincgb 
traincgf 
traincgp 
traingd 
traingda 
traingdm 
traingdx 
trainJlm 
trainoss 
trainr 
trainrp 
trains 


trainscdg 


Batch training with weight and bias learning rules. 
BEFGS quasi-Newton backpropagation . 

Bayesian regularization. 

Cyclical order incremental update 

Powell-Beale conjugate gradient backpropagation. 
Fletcher-Powell conjugate gradient backpropagation. 
Polak-Ribiere conjugate gradient backpropagation. 
Gradient descent backpropagation. 

Gradient descent with adaptive lr backpropagation, 
Gradient descent with momentum backpropagation. 
Gradient descent with momentum & adaptive lr backprop, 
Levenberg-Marquardt backpropagation. 

One step secant backpropagation. 

Random order incremental update. 

Resilient backpropagation (Rprop). 
Sequential order incremental update. 


Scaled conjugate gradient backpropagation. 








Transfer Derivaftive Funcfions 





dhardl1im 


dhardlms 


Hard limit transfer qderivative fanction. 


Symmetric hard limit transfer derivative function， 
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Transfer Derivaftive Funcfions 





dl1ogsig Log sigmoid transfer derivative function 

dpos1lin Positive linear transfer derivative foanction. 

dpurelin Linear transfer derivative function. 

dradbas Radial basis transfer derivative function. 

dsatlin Saturating linear transfer derivative fanction. 

dsat1lins Symmetric saturating linear transfer derivative function. 
dtansig Hyperbolic tangent sigmoid transfer derivative function. 
dtribas Triangular basis trangsfer derivative function. 








Transfer Functions 





Compet Competitive transfer function. 

hard1lim 再 ard limit transfer function. 

hardl1ims Symmetric hard limit transfer function. 
10gsig Log sigmoid trangsfer function. 

poslin Positive linear transfer function. 

purelin Hard limit transfer function. 

radbas Radial basis transfer fanction， 

Sat1lin Saturating linear transfer function. 

Sat1Lins Symmetric saturating linear transfer function. 
SoftmaxX Softmax trangsfer function. 

tansig 了 yperbolic tangent sigmoid trangsfer function， 
tribas Triangular basis transfer fanction， 
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Utility Functions 





Calca 
Calcal 
calce 
Calcel 
CalLcgx 
calcjej j 


Calcjx 


calcpd 
Calcperf 
formx 
getX 


SetX 


Calculate network outputs and other signals. 

Calculate network signals for one time step，. 

Calculate layer erTrors. 

Calculate layer erTrors for one time step. 

Calc. weight and bias perform., gradient as a Single vector. 
Calculate Jacobian performance vector, 


Calculate weight and bias performance Jacobian as a single 
ImatTrlX. 


Calculate delayed network inputs. 

Calculation network outputs, signals, and performance. 
Form bias and weights into single vector, 

Get all network weight and bias values as a Single vector， 


Set all network weight and bias values with a single vector. 








Vector Funcftions 





Cel1L2mat 
Combvec 
con2seq 
concur 
Ind2vecC 


mat2cel1 


Combine a cell array of matrices into one matTrix. 
Create all combinations of vectors. 

Converts concurrent vectors to sequential vectors. 
Create concurrent bias Vectors， 

Convert indices to vectors. 


Break matrix up into cell array of matrices， 
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Vector Funcfions 





minmax 
normc 
normr 
pnormc 
quant 
Sedq2con 
Sumsqr 


vec2ind 


Ranges of matrix rows. 

Normalize columns of matrix. 

Normalize rows of matrix. 

Pseudo-normalize columns of matrix. 

Discretize value as multiples of a quantity. 
Convert sequential vectors to concurrent Vectors， 
Sum squared elements of matrix. 


Convert vectors to indices， 








Weight and Bias Initializaftion Funcfions 





initcon 
initzero 
midpoint 
randnc 
randnr 
rands 


Fevert 


Conscience bias initialization function. 

Zero weight and bias initialization function， 

Midpoint weight initialization function. 

Normalized column weight initialization fanction. 
Normalized row weight initialization fanction. 
Symmetric random weight/bias initialization function,， 


Change ntwk wts. and biases to prev. initialization values. 








Weight Derivative Function 





ddotprod 


Dot product weight derivative function. 
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Weight Funcfions 





dist 

dotprod 
mandist 
negdist 


normprod 


Euclidean distance weight function. 
Dot product weight fanction. 
Manhattan distance weight fanction. 


Negative distance weight fanction, 


Normalized dot product weight function. 





Transfer Funcfions 





Transfer Function 





compet 


hard]1im 


hardlims 


10ogsig 


poslin 


purelin 


Fadbas 


Sat1lLin 


Competitive transfer function. 


Hard limit trangsfer function. 


Symmetric hard limit transfer function 


Log sigmoid transfer function 


Positive linear trangsfer function 


Linear trangsfer function. 


Radial basis transfer function. 


Saturating linear transfer function. 


四 加 


加 


习 由 四 
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Transfer Function 





Sat1Lins Symmetric saturating linear transfer function 
SoftmaxX Softmax transfer function. 

tansig Hyperbolic tangent sigmoid trangsfer function. 
tribas Triangular basis trangsfer function. 


四 叫 号 


口 





Transfer Function Graphs 


Input n Output a 


2 1 4 3 0 0 1 0 


a = SoO1tmax(m/) 
Compet Transfer Function 





QG = ja1alz2(J) 


Hard-Limit Transfer Function 
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QG = Ja7QlLzs(J) 


Symmetric Hard-Limit Trans. Funct. 





Q = /1ogsig(1 
Log-Sigmoid Transfer Function 





QG = Posli7(7) 


Positive Linear Transfer Funct. 
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QG = DUFeL7(A) 


Linear Transfer Function 






0 
-0.833 | +0.833 
QG = 7adbpas(7) 


Radial Basis Function 








Q = Sa1HIP(D) 


Satlin Transfer Function 
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Q = SG17S(11) 


Satlins Transfer Function 


Input n Output a 





0 1 0.5 0.17 0.46 0.1 0.28 


a= So1tbmax(m) 


Softmax Transfer Function 





QG = 1a11818(11) 


Tan-Sigmoid Transfer Function 
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Q = 1ripas(11) 


Triangular Basis Function 
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Functions 一 Alphabeftical List 


The function reference pages appear in this section, listed in alphabetical 
order. 


Reference Headings 


Following is a list ofheadings used in the function reference pages. Not every 
function will have all this material, but the material that is included will be 
ordered as shown. 


e Purpose 

e Graph and Symbol 
e Syntax 

e To Get Help 
e Description 
e。 Properties 

e 了 上 xamples 

e Network Use 
se。 Algorithm 

e Limitations 
e Notes 

e See Also 

e。 了 References 
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adapt 





Purpose 


Syntax 
To Get Help 


Descripfion 


Allow a neural network to adapt (chnange weights and biases on each 
presentation of an input) 


[net,Y,E,Pf,Af] = adapt(net,P,T,PI,AiI) 
Type help network/adapt 


This function calculates network outputs and erTrors after each presentation of 
an Input， 
[net,Y,E,Pf,Af,tr] = adapt(net,P,T,PI,Ai) takes， 


net - Network. 


P  - Network inputs, 

T ， - Network targets, default = zeros. 

Pi - Initial input delay conditions, default = zeros. 
Ai - Initial layer delay condqitions, default = zeros. 


and returns the following after applyingthe adaptfunction net.adaptFcn with 
the adaption parameters net.adaptParam: 


net - Updated networkK. 


Y “ - Network outputs. 
E ， - Network errors. 
Pf - Final input delay conditions， 


Af  - Final layer delay conditions, 
tr - Training record (epoch and perf). 


Note thatTis optional and only needs to be used for networks that require 
targets. Pi and Pf are also optional and only need to be used for networks that 
have input or layer delays， 


adapt'”s signal arguments can have two formats: cell array or matrix. 
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adapt 





The cell array format is easiest to describe. It is most convenient for networks 
with multiple inputs and outputs, and allows sequences of inputs to be 


presented: 
- Ni X TS cell array, each element P{i,tslis an Ri x Qmatrix. 
- Nt x TS cell array, each element T{fi,ts}yisaVvi x Qmatrix. 
Pi - Ni x IDcell array, each element Pi{fi,kyisanRi x Qmatrix. 
Ai - N1 x LDcell array, each element Ai{fi,kyis anSi x Qmatrix. 
Y - NO x TScell array, each element Y{fi,ts}yisaUi x Qmatrix. 
E - Nt x TScell array,each element E{fi,ts}yisgaVvi x Qmatrix， 
Pf - Ni x IDcell array, each element Pf{i,k}yis an Ri x Qmatrix. 
Af - N1 x LDcell array, each element Af{i,kyis anSi x Qmatrix. 
Where 


Ni = net.numInputs 

N1L = net.numLayers 

No = net.numoutputs 

Nt = net.numTargets 

ID = net.numInputDelays 

LD = net.numLayerDelays 

TS = Number oftime steps 

Q = Batch Size 

Ri = net.inputs{i}.sSize 

Si = net.1Layers{i}.Size 

UiI = net.outputs{i}.Size 

Vi = net.targets{i}.sSize 
The columns of Pi, Pf, Ai, and Af are ordered 位 om oldest delay condition to 
Imost recent: 


Pifi,k}=inputiattmets = k-ID. 

TS+Kk -ID. 
Ai{fi,k}y =layer output iattmets = k-LD. 
Af{i,k} =]layer output iattme ts = TS+k-LD. 


Pf{fi,k}=input iattmets 
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Examples 


The matrix format can be used ionly one time step is to be simulated (TS = 1). 
It is convenient for network's with only one input and output, but can be used 
with networks that have more， 


了 ach matrix argument is found by storing the elements of the corresponding 
cell array argument in a single matTrix: 
- (Sum of Ri) X Q matrixX 
- (Sum ofVi)XQ matrix 
Pi - (sum ofRi)X(IDx*Q) matrix. 
Ai - (Sum ofSi)xX(LDx*Q) matrix. 
Y - (sum ofUi)xXQmatrix， 
Pf - (Sum ofRi)X(IDx*Q) matrix. 
Af - (Sum ofSi)X(LDxQ) matrix. 


Here two sequences of 12 steps (where T1 is known to depend on P1) are used 
to define the operation of a filter. 


pit ={-1 0 
t1 ={-1 -1 


1-1 0-1101); 
2 0-1-1011); 


Here newlin is used to create a layer with an input range of [-1 1]), one 
neuron, input delays of 0 and 1, and a learning rate of 0.5. The linear layer is 
then simulated. 


net = newlin([-1 1],1,[0 1];0.5) 
Here the network adapts for one pass through the sequence. 


The network's mean squared error is displayed. (Since this is the first call of 
adapt, the default Pi is used.) 


[net,y,epf] = adapt(net,p1,t1) ; 
mse(e) 


Note the errors are quijte large. Here the network adapts to another 12 time 
steps (using the previous Pf as the new initial delay conditions.) 


p2={1-1-111-1 0001 -1 -1)}; 
t2={2 0-202 0-1001 0 -1 
[net,y,e,pf]l] = adapt(net,p2,t2,pf); 
mse(e) 
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Algorithm 


See Also 
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Here the network adapts for 100 passes through the entire sequence. 


p3 = [pf1 p2]; 

t3= [tl t2]; 
net.adaptParam.passes = 100; 
[net,y,e]l = adapt(net,p3t3) ; 
mse(e) 


The error after 100 passes through the sequence is very small. The network has 
adapted to the relationship between the input and target Signals. 


adapt calls the function indicated by net.adaptFcn, using the adaption 
parameter values indicated by net.adaptParanm. 


Given an input sequence with TS steps, the network is updated as follows, 

卫 ach step in the sequence of inputs is presented to the network one at a time. 
The network's weight and bias values are updated after each step, before the 

Dext step in the sequence is presented. Thus the network is updated TS times. 


Slimn，init，train，revert 


boxdisf 





Purpose 
Synftax 


Descripfion 


Examples 


Network Use 


Algorithm 


See Also 


Box distance function 

d = boxdist(pos ) ; 

boxdist is a layer distance fanction that is used to find the distances between 
the layers neurons, given their positions. 

boxdist(pos) takes one argument， 


pos - N XSmatrix ofneuron positions. 


and returns theSxSmatrix of distances. 
boxdist is most commonly used in conjunction with layers whose topology 
function is gridtop. 
Here we define arandom matrix of positions for 10 neurons arranged in 
three-dimensional space and then find their distances. 

pos = _rand(3,10) 

d = boxdist(pos) 
You can create a standard network that uses boxdist as a distance function by 
calling newsom. 


To change a network so that a layer's topology uses boxdist, set 
net.1Layers{i}.distanceFcnto 'boxdist'. 


In either case, call simto simulate the network with boxdist. See newsom for 
training and adaption examples. 


The box distance D between two position vectors Pi and Pj 位 om a set of S 
Vectors 18; 


Dij = max(abs(PI-P]j)) 


Slimn，dist，mandist，1Linkdist 
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Purpose 
Syntax 


Descripfion 


Examples 
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Calculate network outputs and other Signals 
[Ac,N,LWZ,IWZ,BZ] = calca(net,Pd,Ai,Q,TS) 


This function calculates the outputs of each layer in respongse to a network”s 
delayed inputs and initial layer delay conditions 


[Ac,N,LWZ,IWZ,BZ] = calcal(net,Pd,Ai,Q,TS) takes， 
net - Neural network. 
Pd  - Delayed inputs, 
Ai - Initial layer delay conditions. 
Q ，”- Concurrent Size. 
TS  - Time steps. 


and returns， 


Ac -_ Combined layer outputs = [Ai, calculated layer outputs]. 
N  - Net inputs. 

LWZ - Weighted layer outputs. 

IWZ - Weighted inputs. 


BZ “ - Concurrent biases. 


Here we create alinear network with a single input element ranging 他 om 0 to 
1, three neurons, and a tap delay on the input with taps at zero, two, and four 
time steps. The network is also given a recurrent connection 位 om layer 1 to 
itself with tap delays of [1 2]. 


net = newlin([0 1],3,[0 2 41]) 
net.1ayerConnect(1,1) = 1; 
net.1ayerWeights{1,1}.delays = [1 2]; 


Here is a single (Q = 1) input sequence P with eight time steps (TS = 8), and 
the four initial input delay conditions Pi, combined inputs Pc, and delayed 
inputs Pd. 


P={o0.10.30.60.40.7 0.2 0.1}| 
Pi = {0.20.3 0.4 0.1}; 

Pc = [Pi P]; 

Pd = calcpd(net,8,1,Pc) 


cqalca 





Here the two initial layer delay conditiongs for each of the three neurons are 
defined: 


Ai = {[0.5; 0.1; 0.2] [0.6; 0.5; 0.2]1}; 


Here we calculate the network's combined outputs Ac, and other Signals 
described above, 


[Ac,N,LWZ,IWZ,BZ] = calca(net,Pd,Ali,1,8) 
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PurPpose 
Syntax 


Descripfion 


Examples 
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Calculate network signals for one time step 
[Ac,N,LWZ,IWZ,BZ] = calcal(net,Pd,Ai,Q) 


This function calculates the outputs of each layer in respongse to a network”s 
delayed inputs and initial layer delay conditions, for a single time step, 


Calculating outputs for a single time step is useful for sequential iterative 
algorithms such as trains, which need to calculate the network respongse for 
each time step individually. 


[Ac,N,LWZ,IWZ,BZ] = calcal(net,Pd,Ai,Q) takes， 


net - Neural network. 

Pd  - Delayed inputs for a single time step, 

Ai - Initial layer delay conditions for a single time step. 
Q “ - Concurrent size. 


and returns， 


A -Layer outputs for the time step, 

N  ， - Net inputs for the time step. 

LWZ - Weighted layer outputs for the time step. 
IWZ - Weighted inputs for the time step. 

BZ “ - Concurrent biases for the time step. 


Here we create alinear network with a single input element ranging 位 om 0 to 
1, three neurons, and a tap delay on the input with taps at zero, two, and four 
time steps. The network is also given a recurrent connection 位 om layer 1 to 
itself with tap delays of [1 2]. 


net = newlin([0 1],3,[0 2 41]) 
net.1ayerConnect(1,1) = 1; 
net.1ayerWeights{1,1}.delays = [1 2]; 


Here is a single (Q = 1) input sequence P with eight time steps (TS = 8), and 
the four initial input delay conditions Pi, combined inputs Pc, and delayed 
inputs Pd. 


P={00.10.30.60.40.70.2 0.1}; 
Pi = {0.2 0.3 0.4 0.1}; 


cqalca 1 





Pc 
Pd 


[PI P]; 
calcpd(net,8,1,Pc) 


Here the two initial layer delay conditions for each of the three neurongs are 
defined: 


Al = {[0.5; 0.1; 0.2] [0.6; 0.5; 0.2])}; 


Here we calculate the network's combined outputs Ac, and other Signals 
described above. 


[Ac,N,LWZ,IWZ,BZ] = calca(net,Pd,AlI,1,8) 
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PurPpose 
Syntax 


Descripfion 


Examples 
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Calculate layer erTrors 
E1 = calce(net,Ac,T1L,TS) 


This function calculates the errors of each layer in response to layer outputs 
and targets. 


E1 = calce(net,Ac,T1L,TS) takes， 


net - Neural network. 

Ac - Combined layer outputs. 
T1  - Layer targets. 

Q ，“- Concurrent Size. 


and returns， 


E1  - Layer errors. 


Here we create alinear network with a single input element ranging from 0 to 
1, two neurons, and atap delay on the input with taps at 0, 2, and 4time steps， 
The network is also given a recurrent connection 位 om layer ltoitself with tap 
delays of [1 2]. 


net = newlin([0 1],2); 
net.1ayerConnect(1,1) = 1; 
net.1ayerWeights{1,1}.delays = [1 2]; 


Here is a single (Q = 1) input sequence P with five time steps (TS = 5), and the 
four initial input delay conditions Pi, combined inputs Pc, and delayed inputs 
Pd. 


P=1{o0.10.30.6 0.4}; 
Pi = {0.20.3 0.4 0.1}; 
Pc = [Pi P]; 

Pd calcpd(net,5,1,Pc) ; 


Here the two initial layer delay conditions for each of the two neurons are 
defined, and the networks combined outputs Ac and other signals are 
calculated. 


Al = ({[0.5; 0.1] [0.6; 0.5]1}; 
[Ac,N,LWZ,IWZ,BZ] = calca(net,Pd,AI,1,5) 


cqalce 





Here we define the layer targets for the two neurons for each of the five time 
steps, and calculate the layer erTrors， 


T1 {[0.1;0.2] [0.3;0.1]，[0.5;0.6] [0.8;0.9]，[0.5;0.1]}; 
E1 = calce(net,Ac,T1,5) 


Here we view the network's error for layer 1 at time step 2. 
E]1{1,2} 
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Purpose 
Syntax 


Descripfion 


Examples 
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Calculate layer errors for one time step 
E1 = calce1(net,A,TJ1) 


This function calculates the errors of each layer in response to layer outputs 
and targets, for a single time step. Calculating errors for a single time step is 
useful for sequential iterative algorithms such as trains which need to 
calculate the network response for each time step indqividually. 


E1 = calce1(net,A,T1L) takes， 


net - Neural network. 
A  - Layer outputs, for a single time step. 
T1  - Layer targets, for a single time step. 


and returns， 


E1  - Layer errors, for a single time step. 


Here we create alinear network with a single input element ranging 位 om 0 to 
1, two neurons, and a tap delay on the input with taps at zero, two, and four 
time steps. The network is also given a recurrent connection 位 om layer 1 to 
itself with tap delays of [1 2]. 


net = newlin([0 1],2); 
net.1ayerConnect(1,1) = 1; 
net.1ayerWeights{1,1}.delays = [1 2]; 


Here is asingle(Q = 1)input sequenceP with fivetime steps (TS = 5), and the 
four initial input delay conditions Pi, combined inputs Pc, and delayed inputs 
Pd. 


P=1{o0.10.30.6 0.4}; 
PI = {0.20.3 .0.4 .0.1}; 
[PiI P]; 

calcpd(net,5,1,Pc) ; 


本 
口 O 〇 
1 册 


Here the two initial layer delay conditions for each of the two neurons are 
defined, and the networks combined outputs Ac and other signals are 
calculated. 


Ai = {[0.5; 0.1] [0.6; 0.5])}; 


cqalce 1] 





[Ac,N,LWZ,IWZ,BZ] = calca(net,Pd,AlI,1,5) 


Here we define the layer targets for the two neurons for each of the five time 
steps, and calculate the layer error using the first time step layer output 
Ac(:,5) (The fiveis found by adding thenumber oflayer delays, 2, to the time 
step 1.), and the first time step targets T1(: ,1). 


TL =({[0.1;0.2] [0.3;0.1]，[0.5;0.6] [0.8;0.9]，[0.5;0.1]1}; 
E1 = calce1(net,Ac(:,，3),TL(:，,1)) 


Here we view the network's error for layer 二 . 


E1{1} 
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PurPpose 
Syntax 


Descripfion 


Examples 
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Calculate weight and bias performance gradient as a single vector 
[gxX,normgXx] = calcgx(net,X,Pd,BZ,IWZ,LWZ,N,Ac,E1L,perf,Q,TS) 
This function calculates the gradient of a network's performance with respect 


to its vector of weight and bias values X. 


Ifthe network has no layer delays with taps greater than 0 the result is the 
true gradient. 


Ifthe network as layer delays greater than 0, theresult is the 了 Iman gradient， 
an approximation of the true gradient. 


[gx,normgX] = calcgx(net,X,Pd,BZ,IWZ,LWZ,N,Ac,E1,perf,Q,TS) takes， 


net - Neural network. 


X - Vector of weight and bias values, 
Pd - Delayed inputs. 

BZ - Concurrent biases， 

IWZ - Weighted inputs. 

LWZ - Weighted layer outputs. 
N - Net inputs. 

Ac -_ Combined layer outputs. 
EL - Layer errors. 

perf “ - Network performance, 

Q - _ Concurrent Size. 

TS - Time steps， 


and returns， 


gxX - Gradient dPerfdX. 
normgXx - Norm of gradient. 


Here we create alinear network with a single input element ranging 他 om 0 to 
1, two neurons, and a tap delay on the input with taps at zero, two, and four 
time steps. The network is also given a recurrent connection 位 om layer 1 to 
itself with tap delays of [1 2]. 


net = newlin([0 1]，,2) ; 


calcgx 





See Also 


net.1ayerConnect(1;,1) = 1; 
net.1ayerWeights{1,1}.delaySs = [1 2]; 


Here is a single (Q = 1) input sequence P with five time steps (TS = 5), and the 
four initial input delay conditions Pi, combined inputs Pc, and delayed inputs 
Pd. 


P=1{oO0.10.30.6 0.4}; 
PI = {0.20.3 .0.4 .0.1}; 
Pc = [Pi P]; 

Pd calcpd(net,5,1,Pc) 


Here thetwo initial layer delay conditiongs for each ofthe two neurons, and the 
layer targets for the two neurongs over five time steps are defined. 


Al = {[0.5; 0.1] [0.6; 0.5])}; 
T1 =({[0.1;0.2] [0.3;0.1]，[0.5;0.6] [0.8;0.9]，[0.5;0.1]1}; 


Here the network's weight and bias values are extracted, and the network”s 
performance and other signals are calculated. 


X = getx(net); 
[perf,E1L,Ac,N,BZ,IWZ,LWZ] = calcperf(net,X,Pd,T1,Al,1,5); 


Finally we can use calcgzto calculatethe gradient ofperformance with respect 
to the weight and bias values X. 


[gx,normgXx] = calcgx(net,X,Pd,BZ,IWZ,LWZ,N,Ac,E1L,perf,1,5) ; 


calcjx，calcjej j 
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PurPpose Calculate Jacobian performamnce vector 
Syntax [je,jj,normje]l = calcjejj(net,Pd,BZ,IWZ,LWZ,N,Ac,E1,Q,TS,MR) 
Description This fanction calculates two values (related to the Jacobian of a network) 


required to calculate the network's Hessian, in a memory efficient way. 


Two values needed to calculate the Hessian of a network are J# 了 (Jacobian 
times errors) and JJ (Jacobian squared). However the Jacobian J can take up 
a lot of memory. This function calculates J# 了 E and JJ by dividing up training 
vectors into groups, calculating partial Jacobians Ji and its associated valuesg 
Ji*Ei and JiJi, then summing the partial values into the fall JE and JJ 
values， 


This allows the J#EE and JJ values to be calculated with a series of smaller Ji 
matrices, instead of a larger J matrix. 


[je,jj,normgx] = calcjejj(net,PD,BZ,IWZ,LWZ,N,Ac,E1,Q,TS,MR) takes， 


net - Neural network . 


PD - Delayed inputs. 

BZ - Concurrent biases. 

IWZ - Weighted inputs. 

LWZ - Weighted layer outputs. 
N - Net inputs. 

Ac -_ Combined layer outputs. 
EL - Layer errors. 

Q - Concurrent size. 

TS - Time steps， 

MR - Memory reduction factor 


and returns， 


je - Jacobian times errorSs. 
jj - Jacobian transposed time the Jacobian.normgX 
normgXx - Norm of gradient 
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Examples 
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Here we create a linear network with a single input element ranging from 0 to 
1, two neurons, and a tap delay on the input with taps at zero, two, and four 
time steps. The network is also given a recurrent connection 位 om layer 1 to 
itself with tap delays of [1 2]. 


net = newlin([0 1]，,2) ; 
net.1ayerConnect (1;,1) = 1; 
net.1ayerWeights{1,1}.delays = [1 2]; 


Here is a single (Q = 1) input sequence P with five time steps (TS = 5), and the 
four initial input delay conditions Pi, combined inputs Pc, and delayed inputs 
Pd. 


P=1{0oO0.10.30.6 0.4}; 
PI = {0.20.3 .0.4 .0.1}; 
Pc = [Pi P]; 

Pd calcpd(net,5,1,Pc) ; 


Here thetwo initial layer delay conditiongs for each ofthe two neurons, and the 
layer targets for the two neurons over five time steps are defined. 


Al = {[0.5; 0.1] [0.6; 0.5])}; 
T1 =({[0.1;0.2] [0.3;0.1]，[0.5;0.6] [0.8;0.9]，[0.5;0.1]1}; 


Here the network's weight and bias values are extracted, and the network”s 
performance and other signals are calculated. 


[perf,E1L,Ac,N,BZ,IWZ,LWZ] = calcperf(net,X,Pd,T1,Al,1,5); 


Finally we can use calcgx to calculate the Jacobian times error, Jacobian 
squared, and the norm of the Jocobian times error using a memory reduction 
of 2. 


[je,jj,normje]l = calcjejj(net,Pd,BZ,IWZ,LWZ,N,Ac,E1,1,5,2) ; 


The results should be the same whatever the memory reduction used. Here a 
memory reduction of 3 is used. 


[je,jj,normje]l = calcjejj(net,Pd,BZ,IWZ,LWZ,N,Ac,E1,1,5,3) ; 


calcjx，calcjej j 
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Calculate weight and bias performance Jacobian as a single matrix 
jx = calcjx(net,PD,BZ,IWZ,LWZ,N,Ac,Q,TS) 


This function calculates the Jacobian of a network'”s errors with respect to its 
vector of weight and bias values X. 
[jX] = calcjx(net,PD,BZ,IWZ,LWZ,N,Ac,Q,TS) takes， 


net - Neural network. 


PD - Delayed inputs. 

BZ - _ Concurrent biases. 

IWZ - Weighted inputs. 

LWZ - Weighted layer outputs. 
N - Net inputs. 

Ac -_ Combined layer outputs. 
Q - _ Concurrent size. 

TS - Time steps， 


and returns， 


jX - Jacobian of network erTrors with respect to 又 . 


Here we create alinear network with a single input element ranging 他 om 0 to 
1, two neurons, and a tap delay on the input with taps at zero, two, and four 
time steps. The network is also given a recurrent connection 位 om layer 1 to 
itself with tap delays of [1 2]. 


net = newlin([0 1],2); 
net.1ayerConnect(1,1) = 1; 
net.1ayerWeights{1,1}.delays = [1 2]; 


Here is a single (Q = 1) input sequence P with five time steps (TS = 5), and the 
four initial input delay conditions Pi, combined inputs Pc, and delayed inputs 
Pd. 


P=1{o0.10.30.6 0.4}; 
Pi = {0.20.3 0.4 0.1}; 
Pc = [Pi P]; 

Pd calcpd(net,5,1,Pc) ; 


calcjx 





See Also 


Here thetwo initial layer delay conditiongs for each ofthe two neurons, and the 
layer targets for the two neurongs over five time steps are defined. 
Ai 
TI 


{[0.5; 0.1] [0.6; 0.5])}; 
{[0.1;0.2] [0.3;0.1]，[0.5;0.6] [0.8;0.9]，[0.5;0.1]}; 


Here the network's weight and bias values are extracted, and the network”s 
performance and other signals are calculated. 


[perf,E1L,Ac,N,BZ,IWZ,LWZ] = calcperf(net,X,Pd,T1,Al,1,5); 
Finally we can use calcjx to calculate the Jacobian. 


jX = calcjx(net,Pd,BZ,IWZ,LWZ,N,Ac,1,5)j;calcpd 


Calcgx，calcjejj 
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Calculate delayed network inputs 
Pd = calcpd(net,TS,Q,Pc) 
This function calculates theresults ofpassing the network inputs through each 
input weights tap delay line. 
Pd = calcpd(net,TS,Q,Pc) takes， 
net - Neural network. 
TS - Time steps. 
Q ，- Concurrent Size. 


Pc -_ Combined inputs = [initial delay condqitions, network inputs]. 


and returns， 
Pd  - Delayed inputs. 
Here we create alinear network with a single input element ranging 人 fom 0 to 


1, three neurons, and a tap delay on the input with taps at zero, two, and four 
time steps， 


net = newlin([0 1],3,[0 2 4]); 


Here is a single (Q = 1) input sequence P with eight time steps (TS = 8). 
P={00.10.30.60.40.70.2 0.1}; 


Here we define the four initial input delay conditions Pi. 

PILE = {0.20.3 .0.4 0.1}; 
The delayed inputs (the inputs after passing through the tap delays) can be 
calculated with calcpd. 

Pc = [Pi P]; 

Pd = calcpd(net,8,1,Pc) 
Here we vievw the delayed inputs for input weight going to layer 1, from input 
1 at time steps 1 and 2. 


Pd{1,1,1} 
Pd{f1,1,2} 


calcperf 





Purpose 
Synftax 


Descripfion 


Examples 


Calculate network outputs, sijgnals, and performance 
[perf,E1,Ac,N,BZ,IWZ,LWZ]=calcperf(net,X,Pd,T1,Ai,Q,TS) 


This function calculates the outputs of each layer in respongse to a networkSs 
delayed inputs and initial layer delay conditions. 


[perf,E1,Ac,N,LWZ,IWZ,BZ] = calcperf(net,X,Pd,T1,Ai,Q,TS) takes， 
net - Neural network. 
X ， - Network weight and bias values in a single vector. 
Pd - Delayed inputs. 
T1 -Layer targets. 
Ai - Initial layer delay conditions. 
Q “ - Concurrent Size. 
TS - Time steps. 


and returns， 


perf - Network performance. 


EJ - Layer erTrors, 
Ac “ - Combined layer outputs = [Ai, calculated layer outputs] . 
N - Net inputs. 


LWZ  - Weighted layer outputs. 
IWZ  - Weighted inputs. 
BZ ，”- Concurrent biases. 


Here we create a linear network with a single input element ranging from 0 to 
1, two neurons, and a tap delay on the input with taps at zero, two, and four 
time steps. The network is also given a recurrent connection 位 om layer 1 to 
itself with tap delays of [1 2]. 


net = newlin([0 1],2) ; 
net.1ayerConnect(1;,1) = 1; 
net.1ayerWeights{1,1}.delays = [1 2]; 
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Here is a single (Q = 1) input sequence P with five time steps (TS = 5),and the 
four initial input delay conditions Pi, combined inputs Pc, and delayed inputs 
Pd. 


P=1{o0.10.30.6 0.4}; 
PI = {0.20.3 .0.4 .0.1}; 
[PiI P]; 

calcpd(net,5,1,Pc) ; 


了 了 了 
:全 
1 册 


Here the two initial layer delay conditions for each of the two neurons are 
defined. 


Ai = {[0.5; 0.1] [0.6; 0.5]}; 


Here we define the layer targets for the two neurons for each of the five time 
Steps. 


T1L =1{[0.1;0.2] [0.3;0.1]，[0.5;0.6] [0.8;0.9]，[0.5;0.1]}; 
Here the network's weight and bias values are extracted. 
X = getx(net); 


Here we calculate the network'”s combined outputs Ac, and other Signals 
described above. 


[perf,E1,Ac,N,BZ,IWZ,LWZ] = calcperf(net,X,Pd,T1,Ali,1,5) 


coOombvec 
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Create all combinations of vectors 
Combvec(a1,a2...) 


combvec(A1,A2...) takes any number ofinputs， 


A1 - Matrix of N1 (column) vectors 
A2 - Matrix of N2 (column) vectorSs 


and returns a matrix of(N1*N2* .. .) column vectors, where the colummns 
congsist of all possibilities of A2 vectors, appended to A1 vectors, etc. 


al [123;456]; 
a2= [78;9 10]; 
a3 Combvec(a1,a2) 
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Competitive transfer function 


Input n Output a 


2 1 4 3 0 0 1 0 
a= Sojmax(m/) 
Compet Transfer Function 


A = compet(N) 


info = compet(code) 


compet is atransferfunction. Transfer functions calculate alayergs output 位 om 
its net input. 


compet (N) takes one input argument， 


N-SXxQmatrizx ofnet input (column) vectors. 
and returns output vectors with 1 where each net input vector has its 
maximum value, and 0 elsewhere. 
compet (code) returns information about this fuanction. 


These codes are defined: 


"deriv - Name of derivative function. 

mame” -了 ull name. 

"output' - Output range. 

"active' - Active input range. 
compet does not have a derivative function. 
In many network paradigms it is useful to havealayer whose neurons compete 
for the ability to output a 1. In biology this is done by strong inhibitory 
connections between each of the neurons in a layer. The result is that the only 
neuron that can respond with appreciable output is theneuron whosenet input 
is the highest. All other neurons are inhibited so strongly by the wz7P2z7S 
neuron that their outputs are negligible. 


compef 





Examples 


Network Use 


See Also 


To model this type of layer efficiently on a computer, a competitive transfer 
fanction is often used. Such a function transforms the net input vector of a 

layer of neurons so that the neuron receiving the greatest net input has an 

output of 1 and all other neurons have outputs of 0. 


Here we define a net input vector N, calculate the output, and plot both with 
bar graphs. 
n= [0; 1; -0.5; 0.5] 
a = Compet(n) ; 
Subplot(2,1,1)，bar(n)，ylabel(n' ) 
Subplot(2,1,2)，bar(a)，y1label('a') 
You can create a standard network thatuses compet by calling newc or newpnn. 


To change a network so a layer Uses Compet, set 
net.1Layers{i,j}.transferFcn to 'compet'. 


In either case, call sim to simulate the network with compet. 


See newc or newpnn for simulation examples. 


Slim，Softmax 
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Convert concurrent Vectors to sequential vectors 
S = con2sedq(b) 
The Neural Network Toolbox arranges concurrent vectors with a matrix, and 


sequential vectors with a cell array (where the second index is the time step). 


con2seq and seq2con allow concurrent vectors to be converted to sequential 
vectors, and back again. 


con2seq(b)takes one input， 


b - RXTSImatrix. 


and returns one output， 


S - 1XTScell array ofRXIvectors， 
con2seq(b,TS) can also convert multiple batches， 


b - NX1cell array ofmatrices with M#*TS columns. 
TS - Time steps. 


and will return， 


S - NXTScell array ofmatrices with M columns. 


Here a batch of three values is converted to a sequence. 


p1= [142] 
p2 = con2sedq(p1) 


Here two batches ofvectors are converted to two sequences with two time steps. 


p1={[1345j1174l1; [7344;6941]} 
p2 = con2sedq(p1,2) 


Seq2con，concur 
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Create concurrent bias vectors 
concur(b ,qd) 


concur(B,Q) 
B - SxXlbiasvector(orN1L X1cellarray ofvectors). 
Q - Concurrent size. 


Returns an SXBmatrix ofcopies ofB (or NL X 1cell array ofmatrices). 


Here concur creates three copies of a bias vector. 
b= [1; 39; 2; -1]; 
concur(b ,3) 


To calculate a layers net input, the layers weighted inputs must be combined 
with its biases. The following expression calculates the net input for a layer 
with the netsum net input function, two input weights, and a bias: 


n = netSsum(z1,z2,b) 


The above expression works 证 zZ1, Z2, and B are all S X 1 vectors. However, 让 
the network is being simulated by sim (or adapt or train) in response to Q 
concurrent vectors, then Z1 and Z2 will be S xQ matrices. Before B can be 
combined with Z1 and Z2, we must make Q copies of it. 


n = netsum(z1,z2,concur(b,dq)) 


netSsum，netprod，Sim，sSedq2con，con2sed 
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Dot product weight derivative function 


dZ_dP = ddotprod('p WP,Z) 
dZ_dW = ddotprod('w' ,WP,Z) 


ddotprod is a weight derivative function. 
ddotprod('p',W,P,Z) takes three arguments， 


W - SXRweight matrix， 
P - RXQJInputs. 
Z - SXQweighted input. 
and returns the SXR derivative dZ/dP. 


ddotprod('w' ,W,P,Z) returns the RXQderivative dZ/dW. 


Here we define a weightWand input P for an input with three elements and a 
layer with two neurons. 


W= [0 -10.2; -1.110]) 
P= [0.1; 0.6; -0.2]; 


Here we calculate the weighted input with dotprod, then calculate each 
derivative with ddotprod. 


Z = dotprod(W,P) 


dZ qdqP = ddotprod(` p WP,Z) 
dZ dW = ddotprod('w' WP,Z) 


The derivative of a product of two elements with respect to one element is the 
other element. 


dZ/dP =WW 
dZz/dWwW = P 
dotprod 
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Derivative ofhard limit transfer fonction 
dA dN = dhardlim(N,A) 


dhardlimis the derivative function for hard1inm. 
dhardlim(N,A) takes two arguments， 


N - SXQnet input. 
A - SXQoutput. 
and returns the S XQ derivative dA/dN 


Here we define the net input Nfor alayer of3 hardlim mneurons， 
N= [0.1; 0.8j -0.7]; 


We calculate the layers output A with hardlim and then the derivative of A 
with respect to N. 


A = hardlim(N) 
dA qdN = dhardlim(N,A) 


The derivative of hardlim is calculated as follows: 


d = 0 


hard1Lim 
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Derivative of symmetric hard limit transfer fanction 
dA dN = dhardlms(N,A) 


dhardlms is the derivative function for hard1Lims. 
dhardlms(N,A) takes two arguments， 


N - SXQnet input. 
A - SXQoutput. 
and returns the SXQ derivative dA/dN. 


Here we define the net input N for a layer of 3 hardlims mneurons. 
N= [0.1; 0.8j -0.7] 


We calculate the layers output A with hardlims and then the derivative of A 
with respect to N. 


A = hardlims(N) 
dA qdN = dhardlms(N,A) 


The derivative of hardlims is calculated as follows: 


d = 0 


hardl1ims 
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Display a neural network's properties 
disp(net) 

Type help network/disp 

disp(net) displays a network”s properties. 


Here a perceptron is created and displayed. 


net = newp([-1 1; 0 2],3); 
disp(net) 


display，Sim，init，train，adapt 
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Display the name and properties of a neural network's variables 
display(net) 

Type help network/disp 

display(net) displays a network variable'ss name and properties. 


Here a perceptron variable is defined and displayed. 


net = newp([-1 1; 0 2],3); 
display(net) 


display is automatically called as follows: 


net 


disp，Sim，init，train，adapt 
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了 uclidean distance weight function 


Z = dist(W,P) 

df = dist( deriv ') 

D = dist(pos) 

dististheEuclidean distance weightfunction. Weightfunctions apply weights 
to an input to get weighted inputs. 

dist (W,P) takes these inputs， 


W - SXRweight matrix， 
P - RXQmatrix ofQinput (column) vectors， 
and returns the S XQmatrix of vector distances. 


dist(Cderiv) returns '' because dist does not have a derivative function. 


dist is also a layer distance fanction, which can be used to find the distances 
between neurons in a layer. 


dist(pos) takes one argument， 


pos - NXSmatrix ofneuron positions. 


and returns the SXxSmatrix of distances. 


Here we define arandom weight matrixW and input vector P and calculate the 
corresponding weighted input Z. 


W= rand(4,3) 
P = rand(3，,1); 
Z=dist(W,P) 


Here we define arandom matrix of positions for 10 neurons arranged in 
three-dimensional space and find their distances. 


pos = rand(3,10) ; 
D = dist(pos) 
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You can create a standard network that uses dist by calling newpnn or 
newgrnn. 


To change a network so an input weight uses dist, Set 
net.inputWeight{I,j}+.weightFcn to "dist'. 


For a layer weight set net.inputWeight{i,j}.weightFcn to 'dist'. 


To change a network so that a layer's topology uses dist, set 
net.1Layers{fi}.distanceFcn to 'dist'. 


In either case, call simto simulate the network with dist. 


See newpnn or newgrnn for simulation examples. 


The Euclidean distance qd between two vectors X and Y igs: 


d= Sum((x-y).^2).^0.5 


Simn，dotprod，negdist，normprod，mandist，1Linkdist 
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Log sigmoid transfer derivative function 
dA dN = dlogsig(N,A) 


dlogsig is the derivative fanction for 1ogsig. 
dl1ogsig(N,A) takes two arguments， 


N - Sx Qnetinput. 
A - SXQoutput. 


and returns the SX Q derivative dA/dN. 
Here we define the net input Nfor alayer of3 tansig neurons. 
N= [0.1; 0.8; -0.7]; 


Wecalculate thelayers outputA with 1ogsigandthen the derivative ofA with 
Tespect to N. 


A = 1ogsig(N) 
dA_ dN = dlogsig(N,A) 


The derivative of 1ogsig is calculated as follows: 


d=arxr (1 -al) 


1ogsig，tansig，dtansig 
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Mean absolute error performance derivative function 


dPerf dE = dmae('e' ,，E,X,perf,PP) 
dPerf dx = dmae('XxX',，E,X,perf,PP) 


dmae is the derivative function for mae. 
dmae( 'd',E,X,PERF,PP) takes these arguments， 


E - Matrix or cell array of error vector(S). 

X - Vector of all weight and bias values. 

perf - Network performance (ignored). 

PP ， - _ Performance parameters (ignored). 
and returns the derivative dPerf/dE. 


dmae('x',E,X,PERF,PP) returns the dqerivative dPerf/dX. 


Here we define E and X for anetwork with one 3-element output and Six weight 
and bias values. 


E={I1 -2; 0.5]}; 
X= [0; 0.2j -2.2; 4.1; 0.1; -0.2]; 


Here we calculate the network's mean absolute erTror performance, and 
derivatives of performance. 


perf = mae(E) 
dPerf dE = dmae('e' ，E,X) 
dPerf dx = dmae('X'，E,X) 


Note that mae can be called with only one argument and dmae with only three 
arguments because the other arguments are ignored. The other arguments 
exist sothatmae and dmae conform to standard performance function argument 
]ists. 


miae 
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Mean squared error performance derivatives function 


dPerf dE = dmse('e' ，E,X,perf,PP) 
dPerf dx = dmse('XxX',E,X,perf,PP) 


dmse is the derivative function for mse. 
dmse('d',E,X,PERF,PP) takes these arguments， 


E - Matrix or cell array of error vector(S). 

X - Vector of all weight and bias values, 

perf - Network performance (ignored). 

PP ， - _ Performance parameters (ignored). 
and returns the derivative dPerf/dE. 


dmse('x',E,X,PERF,PP) returns the dqerivative dPerfy/dX. 


Here we define E and X for anetwork with one 3-element output and Six weight 
and bias values. 


E={I1; -2; 0.5]} 
X= [0; 0.2;) -2.2; 4.1; 0.1; -0.2]; 


Here we calculate the network's mean squared error performance, and 
derivatives of performance. 


perf = mse(E) 
dPerf dE = dmse('e' ，E,X) 
dPerf dx = dmse('X' ，E,X) 


Note that mse can be called with only one argument and dmse with only three 
arguments because the other arguments are ijgnored. The other arguments 
exist sothatmse and dmse conform to standard performance function argument 
]ists. 


miSe 
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Mean squared error with regularization or performance derivative function 


dPerf dE = dmsereg( ee' ,EX,perf,PP) 
dPerf dx = dmsereg( xx'，E,X,perf,PP) 


dmsereg is the derivative fanction for msereg. 
dmsereg('d',E,X,perf,PP) takes these arguments， 
EE - Matrix or cell array of error vector(S). 
X - Vector of all weight and bias values. 
perf - Network performance (ignored). 
PP -mse performance parameter. 


where PP defines one performance parameters， 


PP.ratio - Relative imnportance of errors vs. weight and bias values. 
and returns the derivative dPerfy/dE. 
dmsereg('x',E,X,perf) returns the derivative dPerf/dX. 


mse has only one performance parameter. 


Here we define an error E and X for anetwork with one 3-element output and 
Six Weight and bias values， 


E={I1i -2; 0.5]}; 
X= [0; 0.2j -2.2; 4.1; 0.1; -0.2]; 


Here the ratio performance parameter is defined so that squared errors are 5 
times as important as squared weight and bias values. 

pp.ratio = 5/(5+1) ; 
Here we calculate the network'”s performance, and derivatives of performance， 


perf = msereg(E,X,pp) 
dPerf_dE = dmsereg('e',，E,X， 
dPerf dx = dmsereg( X'，E,X 


灿 当 


erf,pp) 


p 
perf ,pp) 


msereg 
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Derivative ofnet input product fuanction 
dN_ dz = dnetprod(Z,N) 


dnetprod is the net input derivative function for netprod. 
dnetprod takes two arguments， 


Z - SXQweighted input. 
N - SXQnetinput. 
and returns the S XQ derivative dN/dZ. 


Here we define two weighted inputs for a layer with three neurons. 


z1 = [0; 1; -1]; 
Z2 [1; 0.5; 1.2]; 


We calculate the layers net input N with netprod and then the derivative of N 
with respect to each weighted input. 


N = netprod(Z1,Z2) 
dN_ dzZ1 = dnetprod(Z1;,N) 
dN_dZz2 = dnetprod(Z2,N) 


The derivative of a product with respect to any element ofthat product is the 
product of the other elements. 


netSsum，netprod，dnetsum 
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Sum net input derivative fanction 
dN_ dZz = dnetsum(Z,N) 


dnetsum is the net input derivative function for netsunm. 
dnetsum takes two arguments， 


Z - SXQweighted input. 
N - SXQnet input. 
and returns the SXQ derivative dN/dZ. 


Here we define two weighted inputs for a layer with three neurons， 


到 0 六 全 森 | 
z2 = [1; 0.5; 1.2]; 


We calculate the layers net input N with netsum and then the derivative of N 
with respect to each weighted input. 


N = netsum(Z1,Z2) 
dN_ qdZz1 = dnetsum(Z1,N) 
dN_ qdZ2 = dnetsum(Z2,N) 


The derivative of a sum with respect to any element ofthat sum is always a 
ones matrix that is the same size as the sum. 


netsum，netprod，dnetprod 
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Descripfion 


Examples 


Network Use 


See Also 


Dot product weight function 


Z = dotprod(W,P) 

df = dotprod( deriv ' ) 

dotprod is the dot product weight function. Weight functions apply weights to 
an input to get weighted inputs. 

dotprod(W,P) takes these inputs， 


W - SXRweight matrix， 
P - RXQmatrix ofQinput (column) vectors， 
and returns theSxqQdot product ofW and P. 


Here we define arandom weight matrix Wand input vector P and calculate the 
corresponding weighted input Z. 


W= rand(4,3) 
P = rand(3,1); 
Z = dotprod(W,P) 


You can create a standard network that uses dotprod by calling newp or 
new]in. 


To change a network so an input weight uses dotprod, set 
net.inputWeight{fi,j}y.weightFcn to 'dotprod'. For a layer weight, set 
net.inputWeight{Ii,j}.weightFcnto 'dotprod . 


In either case, call sim to simulate the network with dotprod. 


See newp and newlin for simulation examples. 


Slimn，ddotprod，dist，negdist，normprod 
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dposlin 





PurPpose 
Syntax 


Descripfion 


Examples 


Algorithm 


See Also 
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Derivative of positive linear transfer function 
dA dN = dposlin(N,A) 


dposlin is the derivative fanction for poslin. 


dposlin(N,A) takes two arguments, and returns the SX Q derivative dA/dN. 


Here we define the net input N for a layer of 3 poslin neurons， 
N = [0.1; 0.8; -0.7]; 


Wecalculate thelayers output Awith poslin andthen the derivative ofA with 
Tespect to N. 


A = poslin(N) 
dA qdN = dposlin(N,A) 


The derivative of poslin is calculated as follows: 


d =1 计 0 <=m; 0, Otherwise. 


poslLin 


dpurelin 





Purpose 
Synftax 


Descripfion 


Examples 


Algorithm 


See Also 


Linear transfer derivative function 
dA dN = dpurelin(N,A) 


dpurelinis the derivative function for 1ogsig. 
dpurelin(N,A) takes two arguments， 


N - SXQnetinput. 
A - SxXQoutput. 
and returns the S XQ derivative dA_dN. 


Here we define the net input Nfor alayer of3 purelinmneurons. 
N= [0.1; 0.8j -0.7]; 


We calculate the layers output A with purelin and then the derivative of A 
with respect to N. 


A = purelin(N) 
dA dN = dpurelin(N,A) 


The derivative ofpurelin is calculated as follows: 
D(i,q) = 1 


purelin 
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dradbas 





PurPpose 
Syntax 


Descripfion 


Examples 


Algorithm 


See Also 


14-64 


Derivative ofradial basis trangsfer fuanction 
dA dN = dradbas(N,A) 


dradbas is the derivative function for radbas. 
dradbas (N,A) takes two arguments， 


N - SXQnet input. 
A - SxXQoutput. 
and returns the S XQ derivative dA/dN. 


Here we define the net input N for a layer of 3 radbas neurons. 
N= [0.1; 0.8j -0.7] 


Wecalculate thelayers output Awith radbas and then the derivative ofA with 
Tespect to N. 


A = radbas(N) 


The derivative of radbas is calculated as followsg: 


则 -二 -2 向 各 


radbas 


dsaftlin 





Purpose 
Synftax 


Descripfion 


Examples 


Algorithm 


See Also 


Derivative of saturating linear transfer function 
dA dN = dsatlin(N,A) 


dsat1linis the derivative function for sat1Lin. 
dsatlin(N,A) takes two arguments， 


N - SXQnetinput. 
A - SxXQoutput. 


and returns the S XQ derivative dA/dN 
Here we define the net input Nfor alayer of3 satlLin neurons. 
N= [0.1; 0.8j -0.7]; 


Wecalculate thelayers output Awith satlin andthen the derivative ofA with 
Tespect to N. 


A = Satlin(N) 
dA _ dN = dsatlin(N,A) 


The derivative of satlinis calculated as followsg: 


d=1, 放 0<=mn<=1;0, otherwise. 


Sat1in 
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dsaflins 





Purpose 
Syntax 


Descripfion 


Examples 


Algorithm 


See Also 
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Derivative of symmetric saturating linear transfer function 
dA _ dN = dsatlins(N,A) 


dsat1lins is the derivative function for sat1Lins. 
dsatlins(N,A) takes two arguments， 


N - SXQnet input. 
A - SXQoutput. 
and returns the S XQ derivative dA/dN 


Here we define the net input N for a layer of 3 sat1ins neurons. 
N= [0.1; 0.8j -0.7] 


We calculate the layers output A with satlins and then the derivative of A 
with respect to N. 


A = Satlins(N) 
dA qdN = dsatlins(N,A) 


The derivative of satlins is calculated as follows: 


d=1, 放 -1<=mn<=1;0, othervwise. 


Sat1ins 


dsse 





Purpose 


Synftax 


Descripfion 


Examples 


See Also 


Sum squared error performance derivative function 


dPerf dE = dsse('e' ,E,X,perf,PP) 
dPerf dx = dsse('XxX',，E,X,perf,PP) 


dsse is the derivative function for sse. 
dsse('d',E,X,perf,PP) takes these arguments， 


E - Matrix or cell array of error vector(S). 

X - Vector of all weight and bias values. 

perf - Network performance (ignored). 

PP - _ Performance parameters (ignored). 
and returns the derivative dPerf_dE. 


dsse('x',E,X,perf,PP)returns the qdqerivative dPerf_dX， 


Here we define an error E and X for a network with one 3-element output and 
Six weight and bias values. 


E={I1; -2; 0.5]} 
X= [0; 0.2;) -2.2; 4.1; 0.1; -0.2]; 


Here we calculate the network's sum squared error performance, and 
derivatives of performance. 


perf = SSse(E) 
dPerf dE = dsse('e )，E,X) 
dPerf dx = dsse('X' ，E,X) 


Note that sse can be called with only one argument and dsse with only three 
arguments because the other arguments are ignored. The other arguments 
exist sothat sse and dsse conform to standard performance function argument 
]ists. 


SSe 
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dtansig 





PurPpose 
Syntax 


Descripfion 


Examples 


Algorithm 


See Also 
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也 yperbolic tangent sigmoid transfer derivative fanction 
dA qdN = dtansig(N,A) 


dtansig is the dqerivative function for tansig. 
dtansig(N,A) takes two arguments， 


N - SXQnet input. 
A - SxXQoutput. 
and returns the S XQ derivative dA/dN. 


Here we define the net input N for a layer of 3 tansig neurons. 
N= [0.1; 0.8j -0.7] 


Wecalculate thelayers output Awith tansig andthen the derivative ofA with 
respect to N. 


A = tansig(N) 
dA qdN = dtansig(N,A) 


The derivative oftansig is calculated as follows: 


d = 1-a^2 


tanSsig，1ogsig，dl1ogsig 


diribas 





Purpose 
Synftax 


Descripfion 


Examples 


Algorithm 


See Also 


Derivative of triangular basis transfer function 
dA dN = dtribas(N,A) 


dtribas is the dqerivative function for tribas. 
dtribas(N,A) takes two arguments， 


N - SXQnetinput. 
A - SxXQoutput. 
and returns the S XQ derivative dA/dN 


Here we define the net input N for a layer of3 tripbas neurons. 
N= [0.1; 0.8j -0.7]; 


Wecalculate thelayers output Awith tribas and then the derivative ofA with 
Tespect to N. 


A = tribas(N) 
dA qdN = dtribas(N,A) 


The derivative oftribas is calculated as followsg: 


d=1, 放 -1<=Dn<0;-1 计 0<m<=1;0,otherwise. 


tribas 
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errsurf 





PurPpose 了 Error Surface of single input neuron 
Syntax errsurf(P,T,WV,BV,F) 
Description errsurf(P,T,WV,BV,F) takes these arguments， 


- 1XQmatrx ofinput vectors. 
- 1XQmatrix oftarget vectors， 
WV - Row vector of values of W. 
BV - Row vector of values of B. 
F - Transfer function (string). 


and returns a matrix of error values over WV and BV. 


Examples p= [-6.0 -6.1 -4.1 -4.0 +4.0 +4.1 +6.0 +6.1]; 
七 [+0.0 +0.0 +.97 +.99 +.01 +.03 +1.0 +1.0] 

wv= -1:.1:1; bv = -2.5:.25:2.5; 

eS errsurf(p,t,wv,bv，1ogsig ) 

plotes(wv,bv,ES,[60 30]) 


See Also plotes 
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formX 





Purpose 
Syntax 


Descripfion 


Examples 


See Also 


Form bias and weights into single vector 

X = formx(net,B,IW,LW) 

This fanction takes weight matrices and bias vectors for a network and 
reshapes them into a Single vector. 

X = formx(net,B,IW,LW) takes these arguments， 


net - Neural network. 

B  - NIlxl cell array of bias vectors. 

IW - NIxNicell array ofinput weight matrices. 
LW - NIxNl cell array of layer weight matrices， 


and returns， 
X ， - Vector of weight and bias values 
了 ere we create a network with a two-element input, and one layer of three 
Deurons. 
net = newff([0 1; -1 1],[3]); 
We can get view its weight matrices and bias Vectors as follows: 


b = net.b 
Iw = net.Iw 
1Lw = net.1w 


We can put these values into a single vector as followsgs: 


X= formx(net,net.b,net.iwnet.J1w)) 


getx，Setx 
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gensim 





PurPpose 
Syntax 
To Gef Help 


Descripfion 


Examples 
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Generate a Simulink block for neural network simulation 

gensSsim(net st) 

Type help network/gensim 

gensim(net,st) creates a Simulink system containing a block that simulates 
neural network net. 


gensim(net,st) takes these inputs， 


net - Neural network. 

st - Sample time (default = 1). 
and creates a Simulink system containing a block that simulates neural 
network net with a sampling time of st. 


Ifnet has no input or layer delays (net.numInputDelays and 
net.numLayerDelays are both 0) then you can use -1 for st to get a 
continuously sampling network. 


net = newff([0 1],[5 1]); 
gensim(net) 


gefX 





Purpose 
Syntax 


Descripfion 


Examples 


See Also 


Get all network weight and bias values as a Single vector 
X = getx(net) 


This function gets a network's weight and biases as a vector of values. 
X = getx(NET) 

NET - Neural network. 

X “ - Vector of weight and bias values. 


Here we create a network with a two-element input, and one layer of three 
neurons. 


net = newff([0 1; -1 1],[3]); 


We can get its weight and bias values as followsgs: 


net.Iw{1,1} 
net.b{1} 


We can get these values as a Single vector as followsgs: 


X = getx(net); 


SetX，formx 
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gridtop 





PurPpose 
Syntax 


Descripfion 


Examples 


See Also 
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Grid layer topology function 

pos = gridtop(dim1 ,dim2,...，,dimN) 

gridtop calculates neuron positiongs for layers whose neurons are arranged in 
an N dimensional grid. 

gridtop(dim1 ,dimn2,...,dimN) takes N arguments， 


dimi - Length oflayer in qimension 1 


and returns an NXS matrix of N coordinate vectors where S is the product of 
dim1*dim2*...*xdimN. 


This code creates and displays a two-dimensional layer with 40 neurons 
arranged in a 8-by-5 grid. 
pos = gridtop(8,5); plotsom(pos ) 


This code plots the connections between the same neurons, but shows each 
neuron atthe location ofits weight vector. The weights are generatedrandomly 
so the layer is very disorganized asgs is evident in the plot generated by the 
following code. 


W= rands(40,2); plotsom(W,dist(pos ) ) 


hextop，randtop 


hardlim 





Purpose 


Graph and 
Symbol 


Syntax 


Descripfion 


Examples 


了 Hard limit transfer function 





QG = ja1alU(A) 


Hard-Limit Transfer Function 


A=hardlim(N) 


info = hardlim(code) 


The hard limit transfer function forces a neuron to output a lifits net input 
reaches a thresholdq, otherwise it outputs 0. This allows a neuron to make a 
decision or classification. It can say yes or 10. This kind ofneuron is often 
trained with the perceptron learning rule. 


hardlimis a transfer function. Transfer fanctions calculate a layer's output 
位 om its net input. 


hardlLim(N) takes one input， 


N - SXxQmatrizx ofnet input (column) vectors. 
and returns 1 where N is positive, 0 elsewhere. 
hardlim(code) returns useful information for each code string， 
deriv -Name ofderivative fanction. 
mame'” -了 ull name. 
"output' - Output range. 
active' - Active input range. 


Here is the code to create aplot ofthe hardlim transfer function. 


n = -5:0.1:5; 
a= hardlim(n); 
plot(ny,al) 
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hardilim 





Neftwork Use 


Algorithm 


See Also 
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You can create a standard network that uses hardlim by calling newp. 


To change anetwork so that a layer uses hard1lim, set 
net.1ayers{i}y.transferFcn to 'hardl1im'. 


In either case call sim to simnulate the network with hardlinm. 


See newp for Simulation examples. 


The transfer function output is one is n is less than or equal to 0 and zero 让 nm 
is less than 0. 


hardlim(n) = 1,ifn>= 0; 0 otherwise. 


Slim，hardlims 


hardlims 





Purpose Symmetric hard limit transfer function 
Graph and 
Symbol 4 





QG = Pa1QL2S(O) 


Symmetric Hard-Limit Trans. Funct. 


Syntax A = hardlims(N) 


info = hardlims(code) 


Descripfion The symmetric hard limit transfer function forces aneuron to output a lifits 
net input reaches a thresholdq. Otherwise it outputs -1. Like the regular hard 
limit function, this allows a neuron to make a decision or classification. It can 
Say yes OF 1.0. 


hardlims is a transfer function. Transfer functions calculate a layers output 
位 om its net input. 


hardlims(N) takes one input， 

N - SXxQmatrix ofnet input (column) vectors. 
and returns 1 where N is positive, -1 elsewhere. 
hardlims(code) return useful information for each code string: 


"deriv - Name of derivative fanction. 
mame'” -了 ull name. 

"output' - Output range. 

"active' - Active input range. 


Examples Here is the code to create a plot ofthe hardlims transfer fanction. 


n = -5:0.1:5; 
a= hardlims(n); 
plot(ny,al) 
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hardlims 





Network Use You can create a standard network that uses hardlims by calling newp. 


To change anetwork so that a layer uses hardlims ，Sset 
net.1ayers{i}.transferFcn to 'hard1ims'. 


In either case call sim to simnulate the network with hardlims. 


See newp for Simulation examples. 


Algorithm The transfer function output is one is n is greater than or equal to 0 and -1L 
otherwise. 


hardlim(n) = 1,ifn>= 0; -1 otherwise. 


See Also sim，hardlinm 
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hextop 





Purpose 
Synftax 


Descripfion 


Examples 


See Also 


Hexagonal layer topology function 

pos = hextop(dim1 ,dimn2,...， dimN) 

hextop calculates the neuron positiongs for layers whose neurons are arranged 
in aN dimensional hexagonal pattern. 

hextop(dim1 ,dim2,...,dqimN) takes N arguments， 


dimi - Length oflayer in qimension 1 


and returns an N-by-S matrix of N coordinate vectors where S is the product of 
dim1x*dim2*...*xdimN. 


This code creates and displays a two-dimensional layer with 40 neurons 
arranged in a 8-by-5 hexagonal pattern . 
pos = hextop(8,5); plotsom(pos) 


This code plots the connections between the same neurons, but shows each 
neuron atthe location of its weight vector. The weights are generated randomjly 
so that the layer is very disorganized, as is evident in the fplo generated by the 
following code. 


W= rands(40,2); plLlotsom(W,dist(pos ) ) 


gridtop，randtop 
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hintonw 





Purpose 
Syntax 


Descriptiion 


Examples 


See Also 
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Hinton graph of weight matrix 
hintonw(W,maxwmiznw) 


hintonw(W,maxw,minw) takes these inputs， 


W - SXRweight matrix 
maxw - Maximum weight, default = max(max(abs(W) ) ) . 
minw - Minimum weight, default = M1/100. 
and displays a weight matrix represented as a grid of squares， 


了 ach square's area represents a weight's magnitude. 也 ach square's projection 
(color) represents a weight's sign; inset (red) for negative weights, projecting 
(green) for positive. 


W= rands(4,5) 
The following code displays the matrix graphically. 
hintonw(W) 


Neuron 





hintonwb 


hintonwb 





Purpose 
Syntax 


Descripiion 


Examples 


See Also 


Hinton graph of weight matrix and bias vector 
hintonwb(W,b ,maxw,minw) 


hintonwb(W,B,maxwminw) takes these inputs， 
W - S XRweight matrix. 
B - S X1bias vector. 
maxw - Maximum weight, default = max(max(abs(W) ) ). 
minw - Minimum weight, default = M1/1100. 
and displays a weight matrix and a bias vector represented as a grid of Squares. 


了 ach square'"s area represents a weights magnitude. 了 ach square's projection 
(color) represents a weight's sign; inset (red) for negative weights, projecting 
(green) for positive. The weights are shown on the le 化 . 


The following code produces the result shown below. 


W= rands(4,5) ; 
b = rands(4,1) ; 
hintonwb (W,b) 


Neuron 





hintonw 
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ind2vec 





PurPpose 
Syntax 


Descripfion 


Examples 


See Also 
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Convert indices to vectors 

vec = ind2vec(ind ) 

ind2vec and vec2ind allow indices to either be represented by themselves, or 
as Vectors containing a l in the row of the index they represent. 
ind2vec(ind) takes one argument， 


ind - Row vector of indices. 


and returns a sparse matrix ofvectors, with one 1 in each column, as indicated 
by ind. 


Here four indices are defined and converted to vector representation . 


ind= [1323] 
vec = ind2vec(ind ) 


vec2ind 
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Purpose 
Syntax 
To Geft Help 


Descripfion 


Examples 


Algorithm 


Initialize a neural network 
net = init(net) 
Type help networky/init 


init(net) returns neural network net with weight and bias values updated 
accordingto the network initialization fanction,indicated by net.initFcn, and 
the parameter values, indicated by net.initParam. 


Here a perceptron is created with a two-element input (with ranges of 0 to 二 ， 
and -2 to 2) and 1 neuron. Once it is created we can display the neuron's 
weights and bias.， 


net = newp([0 1;-2 2],1); 
net.iw{1,1} 
net.b{1} 


Training the perceptron alters its weight and bias values, 


P= [0101;0011]; 
T= [0001]; 

net = train(net,P,T) ; 
net.iw{1,1} 

net.b{1} 


init reinitializes those weight and bias values. 


net = Init(net) ; 
net.iw{t1,1} 
net.b{1} 


The weights and biases are zeros again, which are the initial values used by 
perceptron networks (See newp). 

initcalls net.initFcntoinitialize the weight and bias values according to the 
parameter values net.initParam. 


Typically, net.initFcn is set to 'initlay' which initializes each layer'S 
weights and biases according to its net.layers{fi}.initFcn， 
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in 计 





See Also 
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Backpropagation networks have net.1layers{i}.initFcn set to ' initnw'， 
which calculates the weight and bias values for layer 1 using the 
Nguyen-Widrow initialization method. 


Other networks have net.1ayers{i}.initFcn setto 'initwb', which 
initializes each weight and bias with its own initialization function. The most 
common weight and bias initialization function is rands, which generates 
random values between -1 and 1. 


Simn，adapt，train，init1lay，initnw，initwb，rands，revert 


initcon 





Purpose 
Synftax 


Descripfion 


Examples 


Neftwork Use 


Algorithm 


See Also 


Conscience bias initialization function 

b = initcon(s,pr) 

initconis a bias initialization function that initializes biases for learning with 
the learncon learning function. 


initcon (S,PR) takes two arguments， 


S - Number ofrows (neurons). 
PR - RX2matrxofR = [Pmin Pmaxl, default= [1 11]. 


and returngs an S X 1 bias vector. 
Note that for biases, R is always 1. initcon could also be used to initialize 
weights, but it is not recommended for that purpose. 
Here initial bias values are calculated for a 5 neuron layer. 
b = initcon(5) 
You can create a standard network that uses initcon to initialize weights by 
calling newc. 
To prepare the bias oflayer iof a custom network to initialize with initcon: 


1 Set net.initFcnto 'initlay'. (net.initParam will automatically become 
initlay's default parameters.) 


2 Set net.L1ayers{fi}+.initFcnto 'initwb'. 
3 Set net.biases{fi}y.initFcnto 'initcon'. 


To initialize the network, call init. See newc for initialization examples. 


Learncon updates biases so that each bias value b(i) is a function ofthe 
average output c(i) ofthe neuron 1i associated with the bias. 


initcon gets initial bias values by assuming that each neuron has responded 
to equal numbers of vectors in the “past.” 


initwb，initlay，init，1learncon 
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initlay 





PurPpose 


Syntax 


Descripfion 


Network Use 


Algorithm 


See Also 
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Layer-by-layer network initialization function 


net = Initlay(net) 
info = initlay(code) 


initlay is anetwork initialization function that initializes each layer ii 
according to its own initialization fanction net.layers{i}y.initFcn， 


init1lay(net) takes， 


net - Neural network. 
and returns the network with each layer updated. initlay(code) returns 
useful information for each code string: 

"pnames' - Names of initialization parameters. 

"pdefaults'” - Default initialization parameters， 
initlay does not have any initialization parameters, 


You can create a standard network that uses initlay by calling newp, newlin， 
newff, newcf, and many other new network functions. 
To prepare a custom network to be initialized with initlay: 


1 Set net.initFcnto 'initlay'. (This will set net.initParam to the empty 
matrix [] since initlay has no initialization parameters.) 


2 Set each net.Llayers{fi}.initFcn to alayer initialization fuanction， 
(了 Examples of such functions are initwb and initnw). 


To initialize the network, call init. See newp and newlin for initialization 
examples. 


The weights and biases ofeach layer i are initialized according to 
net.LJayers{i}+.initFcn. 


initwb，initnw，ziInit 


initnvv 





Purpose 
Synftax 


Descripfion 


Neftwork Use 


Algorithm 


Nguyen-Widrow layer initialization function 
net = Initnw(net ,1I) 


initnw is a layer initialization function that initializes a layers weights and 
biases according to the Nguyen-Widrow initialization algorithm. This 
algorithm chooses values in order to distribute the active region ofeach neuron 
in the layer approximately evenly across the layers input space, 


initnw(net,I) takes two arguments， 
net - Neural network. 
 - Index of a layer. 
and returns the network with layer is weights and biases updated. 


You can create a standard network that uses initnw by calling newff or newcf. 


To prepare a custom network to be initialized with initnw: 


1 Set net.initFcnto 'initlay'. (This will set net.initParam to the empty 
matrix [] since initlay has no initialization parameters.) 


2 Set net.L1ayers{fi}+.initFcnto 'initnw'. 


To initialize the network call init.See newff and newcf for training examples. 


The Nguyen-Widrow method generates initial weight and bias values for a 
layer, so that the active regiongs of the layer's neurons will be distributed 
approximately evenly over the input space. 


Advantages over purely random weights and biasegs are: 


e Few neurons are wasted (since all the neurons are in the input space). 


e Training works faster (since each area ofthe input space has neurons). The 
Nguyen-Widrow method can only be applied to layers 


=_ With a bias 
= With weights whose "weightFcn'" is dotprod 
= with "netInputFcn'" set to netsum 


Ifthese conditions are not met, then initnwuses rands to initialize the layer”s 
weights and biases， 
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initnvv 





See Also initwb，initlay，init 
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inifvwb 





Purpose 
Synftax 


Descripfion 


Neftwork Use 


Algorithm 


See Also 


By-weight-and-bias layer initialization function 

net = Initwb(net ,1I) 

initwb is a layer initialization fanction that initializes a layers weights and 
biases according to their own initialization functions, 

initwb (net,I) takes two arguments， 


net - Neural network. 
I  - Index of a layer. 
and returns the network with layer is weights and biases updated. 


You can create a standard network that uses initwb by calling newp or newlin，. 

To prepare a custom network to be initialized with initwb: 

1 Set net.initFcn to 'initlay'. (This will set net.initParam to the empty 
matrix [] since initlay has no initialization parameters.) 

2 Set net.1ayers{fi}.initFcnto 'initwb'. 


3 Set each net.inputWeights{fi,j}.initFcnto a weight initialization 
fanction. Set each net.1layerWeights{fi,j}y.initFcnto aweight 
initialization function. Set each net.biases{i}.initFcn to a bias 
initialization function. ( 卫 xamples of such functiongs are rands and 
midpoint.) 


To initialize the network, call init. 


See newp and newlin for training examples. 


卫 ach weight (bias) in layer iis set to new values calculated according to its 
weight (bias) initialization function. 


Initnw，init1lay，init 
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initzero 





PurPpose 


Syntax 


Descripfion 


Examples 


Neftwork Use 


See Also 
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Zero weight and bias initialization function 


W = initzero(S,PR) 
b 


initzero(S,[1 11]) 


initzero(S,PR) takes two arguments， 


S - Number ofrows neurons). 
PR - RX2matrxofinputvalue ranges = [Pmin Pmax] ， 


and returngs an S XR weight matrix of zeros， 


initzero(S,[1 1]) returngs S XIbias vector of zeros. 


Here initial weights and biases are calculated for a layer with two inputs 
ranging over [0 1 and [-2 2], and 4 neurons. 


W = initzero(5,[0 1; -2 2]) 
b = initzero(5,[1 1]) 


You can create a standard network that uses initzero to initialize its weights 
by calling newp or newlin. 


To prepare the weights and the bias of layer i of a custom mnetwork to be 


initialized with midpoint: 


1 Set net.initFcnto 'initlay'. (net.initParam will automatically become 
init1lay's default parameters.) 
2 Set net.1ayers{fi}+.initFcnto 'initwb'. 


3 Seteach net.inputWeights{fi,j}.initFcnto 'initzero'. Set each 
net.1layerWeights{fi,j}.initFcnto initzero. Set each 
net.biases{fi}.initFcn to 'initzero'. 


To initialize the network, call init. 


See newp or newlin for initialization examples. 


initwb，init1lay，init 


learncon 





Purpose 


Syntax 


Descripfion 


Conscience bias learning function 


[dB,LS] = learncon(B,P,Z,N,A,T,E,gW,gA;,D,LP,LS) 
info = learncon(code) 
Learncon is the conscience bias learning function used to increase thenet input 


to neurons that have the lowest average output until each neuron responds 
approximately an equal percentage of the time. 


Learncon(B,P,Z,N,A,T,E,gW,gA;D,LP,LS) takes several inputs， 


B - S X1bias vector. 

P - 1xQones vector. 

Z - SxQweighted input vectors. 

N - SxQnetinput vectors. 

A - SxXxQoutput vectors. 

T - SxQlayertarget vectors. 

E - S XQlayer error Vectors. 

gwW - SXR gradient with respect to performance, 
gA - SXQoutput gradient with respect to performance, 
D - SxXSneuron distances. 

LP - Learning parameters, none, LP = []. 


LS - _ Learning state, initially should be= []. 
and returns 

dB - S X1weight (or bias) change matrix， 

LS - New learning state. 


Learning occurs according to learncon's learning parameter, shown here with 
its default value， 


LP.lIr - 0.001 - Learning rate. 
Learncon(code) returns useful information for each code string. 
"pnames' - Names of learning parameters. 
"pdefaults' - Default learning parameters， 
meedg' - Returns 1 ithis fanction uses gW or gA， 


Neural Network Toolbox 2.0 compatibility: The LP.Lr described above equals 
1 minus the bias time constant used by trainc in Neural Network Toolbox 2.0. 
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Here we define arandom output A, and bias vector Wfor alayer with 3 neurons. 
We also define the learning rate LR. 


a= rand(3,1) 
b = rand(3,1) 
lp.LIFr=0.5; 


Since learncon only needs these values to calculate a bias change (see 
algorithm below), we will use them to do So. 


dw = learncon(b,[]), ,Ia [II IIp)I]) 


To prepare the bias of layer iof a custom network to learn with learncon: 
1 Set net.trainFcnto trainr'. (net.trainParam will automatically become 
trainr's default parameters.) 


2 Setnet.adaptFcnto 'trains'. (net.adaptParam will automatically become 
trains's default parameters.) 


3 Set net.inputWeights{fi}.learnFcnto ' Learncon.Set each 
net.1layerWeights{fi,j}.1LlearnFcnto 'learncon'. (Each weight learning 
parameter property will automatically be set to Ilearncon's default 
parameters.) 


To train the network (or enable it to adapt): 
1 Set net.trainParam (or net.adaptParam) properties as desired. 


2 Call train (or adapt). 


learncon calculates the bias change db for a given neuron by first updating 
each neuron's co1lscie1ce, ji.e. the running average of its output: 


C= (1-lLr)*c + 1Lrxa 


The conscience is then used to compute a bias for the neuron that is greatest 
for smaller conscience values. 


b = exp(1-1og(c)) - b 


(Note that learncon is able to recover C each time it is called from the bias 
Values.) 


Learnk，1learnos，adapt，train 


learngd 





Purpose 


Synftax 


Descripfion 


Gradient descent weight and bias learning function 


[dW,LS] = learngd(W,P,Z,N,A,T,E,gW,gA,D,LP,LS) 
[db,LS] = learngd(b,ones(1,Q)，,Z,N,A,T,E,gW,gA;,D,LP,LS) 


info = learngd(code) 


Learngd is the gradient descent weight and bias learning function. 


learngd(W,P,Z,N,A,T,E,gW,gA,D,LP,LS) takes several inputs， 


W - SxXRweight matrix (or SX 1 bias vector). 

P - RXQinputvectors (or ones(1;,Q)). 

Z - SxQweighted input vectors. 

N - SxQnetinput vectors. 

A - SxXxQoutput vectors. 

T - SxQlayertarget vectors. 

E - S XQlayer error Vectors. 

gwW - SXRgradient with respect to performance. 
gA - SXQoutput gradient with respect to performance, 
D - SxXS neuron distances. 

LP - Learning parameters, none, LP = []. 


LS - Learning state, initially should be = []. 


and returns， 


dW - S XR weight (or bias) change matrix. 

LS - New learning state. 
Learning occurs according to learngd's learning parameter shown here with 
its default value. 

LP.Ir - 0.01 - Learning rate. 
learngd(code) returns useful information for each code string: 


"pnames' - Names of learning parameters， 
"pdefaults' - Default learning parameters. 
meedg' - Returns 1 这 this fanction uses gW or gA， 
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Here we define arandom gradient gW for a weight going to alayer with 3 
neurons, from an input with 2 elements. We also define a learning rate of 0.5. 


gwW = rand(3,2) 
Lp.LIFr=0.5; 


Since learngd only needs these values to calculate a weight change (see 
algorithm below), we will use them to do So. 


dW = learngd([] [II gwW[] IIp,I]) 


You can create a standard network that uses Learngd with newff, newcf, oOT 
newelm. To prepare the weights and the bias of layer 1 of a custom network to 
adapt with learngd: 


1 Set net.adaptFcnto 'trains'. net.adaptParam will automatically become 
trains's default parameters， 


2 Seteach net.inputWeights{fi,j}.1LearnFcnto learngd'. Set each 
net.1ayerWeights{i,j}.LearnFcnto 'Iearngd. Set 
net.biases{fi}.learnFcnto ' learngd'.Each weight and bias learning 
parameter property will automatically be set to learngd's default 
Parameters. 


To allow the network to adapt: 


1 Set net.adaptParam properties to desired values, 
2 Call adapt with the networK. 


See newff or newcf for examples. 
learngd calculates the weight change dW for a given neuron 位 om the neuron2”s 
input P and error E, and the weight (or bias) learning rate LR, according to the 


gradient descent: dw = LFrxgW. 


Learngdm，newff ，newcf，adapt，train 


learngdm 





Purpose 


Synftax 


Descripfion 


Gradient descent with momentum weight and bias learning function 


[dW,LS] 
[db,LS] 


info = Learngdm(code) 


Learngdm(W,P,Z,N,A,T,E,gW,gA;,D,LP,LS) 
learngdm(b,ones(1,Q),Z,N,A,T,E,gW,gA;D,LP,LS) 


learngdm is the gradient descent with momentum weight and bias learning 
fanction. 


learngdm(W,P,Z,N,A,T,E,gW,gA;D,LP,LS) takes several inputs， 


W - SxXRweight matrix (or SX 1 bias vector). 

P - RXQinputvectors (or ones(1;,Q)). 

Z - SxQweighted input vectors. 

N - SxQnetinput vectors. 

A - SxXxQoutput vectors. 

T - SxQlayertarget vectors. 

E - S XQljlayer error Vectors. 

gwW - SXRgradient with respect to performance. 
gA - SXQoutput gradient with respect to performance, 
D - SxXS neuron distances. 

LP - Learning parameters, none，LP = []. 


LS -_ Learning state, initially should be = []. 


and returns， 


dW - S XRweight (or bias) change matrix， 

LS - New learning state. 
Learning occurs according to learngdms learning parameters, shown here with 
their default values. 

LP.lIr - 0.01 - Learning rate， 

LP.mc - 0.9 - Momentum constant. 
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learngdm(code) returns useful information for each code string: 


"pnames' - Names of learning parameters， 
"pdefaults' - Default learning parameters. 
meedg' - Returns 1 ifthis fanction uses gW or gA. 


Here we define arandom gradient G for a weight going to alayer with 3 
neurons, from an input with 2 elements. We also define a learning rate of 0.5 
and momentum constant of 0.8; 

gwW = rand(3,2) 

lp.lIFr=0.5; 

lp.mc = 0.8; 


Since learngdm only needs these values to calculate a weight change (see 


algorithm below), we will use them to do so. We will use the default initial 
learning state. 


JS 三] 
[dwW,1s] = learngdm([],[],[], II,[], [gw[],[], LIp,1lSs) 


learngdm returns the weight change and a nevw learning state. 


You can create astandard networkthat uses learngdm with newff, newcf, or newelLnm. 


To prepare the weights and the bias of layer iof a custom network to adapt 
with learngdm: 


1 Set net.adaptFcnto 'trains'. net.adaptParam will automatically become 
trains's default parameters, 


2 Seteach net.inputWeights{fi,j}.1LearnFcnto ' learngdm'.gSet each 
net.1ayerWeights{i,j}.1LearnFcnto ' Learngdm'. Set 
net.biases{fi}.learnFcnto 'learngdm'. 瑟 ach weight and bias learning 
parameter property will automatically be set to Ilearngdms default 
Parameters. 


To allow the network to adapt: 


1 Set net.adaptParam properties to desired values. 
2 Call adapt with the networK. 


See newff or newcf for examples. 


learngdm 





Algorithm Learngdm calculates the weight change dWwfor a given neuron 位 om the neuron's 
input P and error E, the weight (or bias) W, learning rate LR, and momentum 
constant MC, according to gradient descent with momentums: 


dW = mcxdWprev + (1-mc)*1LrxgW 


The previous weight change dWprev is stored and read from the learning 
state LS. 


See Also learngd，newff，newcf，adapt，train 
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Hebb weight learning rule 


[dWw,LS] = learnh(W,P,Z,N,A,T,E,gW,gA,D,LP,LS) 


info = learnh(code) 


learnh is the Hebb weight learning fanction， 


learnh(W,P,Z,N,A,T,E,gW,gA;,D,LP,LS) takes several inputs， 


W - SxXxRweight matrix (or S x 1 bias Vector). 
P - RxQinputvectors (or ones(1,Q)). 

Z - SxQweighted input vectors. 

N - SxQnetinput vectors. 

A - SxXxQoutput vectorSs. 

T - SxQlayertarget vectors. 

E - S XQlayer error Vectors. 

gwW - SXRgradient with respect to performance. 
gA - SXQoutput gradient with respect to performance, 
D - SxXSneuron distances. 

LP - Learning parameters, none, LP = []. 


LS -_ Learning state, initially should be = []. 


and returns， 


dW - S XR weight (or bias) change matrix. 

LS - Nevw learning state. 
Learning occurs according to learnh's learning parameter, shown here with its 
default value. 

LP.Ir - 0.01 - Learning rate. 
learnh(code) returns useful information for each code string: 


"pnames' - Names of learning parameters， 
"pdefaults' - Default learning parameters. 
meedg' - Returns 1 ifthis fanction uses gW or gA. 


learnh 





Examples 


Network Use 


Algorithm 


See Also 


References 


Here we define arandom input P and output Afor a layer with a two-element 
input and three neurons. We also define the learning rate LR. 


p= rand(2,1); 
a= rand(3,1) 
lp.JFr = 0.5; 


Since learnh only needs these values to calculate a weight change (see 
algorithm below), we will use them to do So. 


dW = learnh([],p,[], [la [IIp,I]) 


To prepare the weights and the bias oflayer iofacustom network to learn with 
earnh: 


1 Set net.trainFcnto trainr'. (net.trainParam will automatically become 
trainr'gs default parameters.) 


2 Setnet.adaptFcnto'trains'. (net.adaptParam will automatically become 
trains's default parameters.) 


3 Set each net.inputWeights{fi,j}.1LearnFcnto 'learnh'. Set each 
net.1layerWeights{fi,j}.1LlearnFcnto 'learnh'. 瑟 ach weight learning 
parameter property will automatically be set to learnh”s default 
parameters.) 


To train the network (or enable it to adapt): 


1 Set net.trainParam (net.adaptParam) properties to desired values. 
2 Call train (adapt). 


learnh calculates the weight change dW for a given neuron 位 om the neuron's 
input P, output A, and learning rate LR according to the Hebb learning rule: 


dw = Jrxaxp' 
Learnhd，adapt，train 


Hebb, D.0., THhe Orsa7zzation of Behauiom New York: Wiley, 1949. 
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PurPpose 


Syntax 


Descripfion 
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Hebb with decay weight learning rule 


[dWw,LS] = learnhd(W,P,Z,N,A,T,E,gW,gA,D,LP,LS) 


info = Learnhd(code) 


Learnhd is the Hebb weight learning function. 


learnhd(W,P,Z,N,A,T,E,gW,gA,D,LP,LS) takes several inputs， 


W - SxXxRweight matrix (or S X1bias vector). 
P - RxQinputvectors (or ones(1,Q)). 

Z - SxQweighted input vectors. 

N - SxQnetinput vectors. 

A - SxXxQoutput vectorSs. 

T - SxQlayertarget vectors. 

E - S XQlayer error Vectors. 

gwW - SXRgradient with respect to performance. 
gA - SXQoutput gradient with respect to performance, 
D - SxXS neuron distances. 

LP - Learning parameters, none, LP = []. 


LS -_ Learning state, initially should be = []. 


and returns， 


dW - S X Rweight (or bias) chnange matrix. 

LS - Nevw learning state. 
Learning occurs according to learnhd's learning parameters Shown here with 
default values, 

LP.dr - 0.01 - Decay rate. 

LP.Ir - 0.1 - Learning rate. 
learnhd(code) returns useful information for each code string: 


"pnames' - Names of learning parameters， 
"pdefaults' - Default learning parameters. 
meedg' - Returns 1 ifthis fanction uses gW or gA. 


learnhd 





Examples 


Network Use 


Algorithm 


See Also 


Here we define arandom input P, output A, and weights Wfor a layer with a 
two-element input and three neurons. We also define the decay and learning 
rates. 


p= rand(2,1); 
a= rand(3,1) 
ws= rand(3;,2) 
lp.dr = 0.05j 
1Lp.JFr = 0.5; 


Since learnhd only needs these values to calculate a weight change (see 
algorithm below), we will use them to do So. 


dW = learnhd(w,p,[], [la [IIp,I]) 


To prepare the weights and the bias oflayer iofacustom network to learn with 
earnhd: 


1 Set net.trainFcnto trainr'. (net.trainParam will automatically become 
trainr's dqefault parameters.) 


2 Setnet.adaptFcnto'trains'. (net.adaptParam will automatically become 
trains's default parameters.) 


3 Set each net.inputWeights{fi,j}.1LearnFcnto 'learnhd'. Set each 
net.1layerWeights{fi,j}.1LlearnFcnto'learnhd'. (Each weight learning 
parameter property will automatically be set to learnhd's default 
parameters.) 


To train the network (or enable it to adapt): 
1 Set net.trainParam (net.adaptParam) properties to desired values. 


2 Call train (adapt). 


Learnhd calculates the weight change dW for a given neuron 位 om the neuron's 
input P, output A, decay rate DR, and learning rate LR according to the Hebb 
with decay learning rule: 


dw = Jrxaxrp' - drxw 


Learnh，adapt，train 
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Descripfion 
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Instar weight learning function 


[dW,LS] = learnlis(W,P,Z,N,A,T,E,gW,gA;,D,LP,LS) 


info = learnis(code) 


learnis is the instar weight learning function. 


learnis(W,P,Z,N,A,T,E,gW,gA,D,LP,LS) takes several inputs， 


W - SxXxRweight matrix (or S x 1 bias vector). 
P - RxQinputvectors (or ones(1,Q)). 

Z - SxQweighted input vectors. 

N - SxQnetinput vectors. 

A - SxXxQoutput vectorSs. 

T - SxQlayertarget vectors. 

E - S XQlayer error Vectors. 

gwW - SXRgradient with respect to performance. 
gA - SXQoutput gradient with respect to performance, 
D - SxXSneuron distances. 

LP - Learning parameters, none，LP = []. 


LS - _ Learning state, initially should be = []. 


and returns， 


dW - S XRweight (or bias) change matrix. 

LS - Nevw learning state. 
Learning occurs according to learnis's learning parameter, shown here with 
its default value. 

LP.Ir - 0.01 - Learning rate. 
learnis(code) return useful information for each code string: 


"pnames' - Names of learning parameters， 
"pdefaults' - Default learning parameters. 
meedg' - Returns 1 ifthis fanction uses gW or gA， 


learnis 





Examples 


Network Use 


Algorithm 


See Also 


References 


Here we define arandom input P, output A, and weight matrixWforalayer with 
a two-element input and three neurons. We also define the learning rate LR. 


p= rand(2,1); 
a = 人 1) 
W nd(3,2) ; 
0 证 : = 0.5j 


Since learnis only needs these values to calculate a weight change (see 
algorithm below), we will use them to do So. 


dW = learnis(w;p,[], [la [IIp,I]) 


To prepare the weights and the bias of layer 1i of a custom metwork so that 这 
can learn with Learnis: 


1 Set net.trainFcnto trainr'. (net.trainParam will automatically become 
trainr'gs default parameters.) 


2 Setnet.adaptFcnto'trains'. (net.adaptParam will automatically become 
trains's default parameters.) 


3 Set each net.inputWeights{fi,j}.1LearnFcnto 'learnis'. Set each 
net.1layerWeights{fi,j}.1LlearnFcnto 'learnis'. (Each weight learning 
parameter property will automatically be set to learnis's default 
parameters.) 


To train the network (or enable it to adapt): 
1 Set net.trainParam (net.adaptParam) properties to desired values. 


2 Call train (adapt). 


learnis calculates the weight change dW for a given neuron 位 om the neuron2s 
input P, output A, and learning rate LR according to the instar learning rule: 


dw= 1JLrxaxr(p' -WwW) 
Learnk，1learnos，adapt，train 


Grossberg, S., Stuaies of tpe Mina aa Brain, Drodrecht, Holland: Reidel 
Press, 1982. 
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ohonen weight learning function 


[dW,LS] = learnk(W,P,Z,N,A,T,E,gW,gA,D,LP,LS) 


info = learnk(code) 


learnk is the 人 Kohonen weight learning function 

learnk(W,P,Z,N,A,T,E,gW,gA;D,LP,LS) takes several inputs， 
W - SxXxRweight matrix (or S x 1 bias vector). 

- RXQinput vectors (or ones(1,Q)). 

- SXQweighted input vectors. 

- SXQnet input vectors, 

SXQoutput vectors. 

- SXQlayer target vectors. 


m 一 记 二 NN 
， 


- SXQlayer error vectors. 

gwW - SXRgradient with respect to performance. 

gA - SxQoutput gradient with respect to performance. 
D - SxXS neuron distances. 

LP - Learning parameters, none, LP = []. 

LS -_ Learning state, initially should be = []. 


and returns， 


dW - S XRweight (or bias) change matrix. 

LS - Nevw learning state. 
Learning occurs according to learnk's learning parameter, shown here with its 
default value. 

LP.Ir - 0.01 - Learning rate. 
learnk(code) returns useful information for each code string: 


"pnames' - Names of learning parameters， 
"pdefaults' - Default learning parameters. 
meedg' - Returns 1 ifthis fanction uses gW or gA. 


learnk 





Examples 


Network Use 


Algorithm 


See Also 


References 


Here we define arandom input P, output A, and weight matrixWforalayer with 
a two-element input and three neurons. We also define the learning rate LR. 


p= rand(2,1); 
a = 人 1) 
W nd(3,2) ; 
0 证 : = 0.5j 


Since learnk only needs these values to calculate a weight change (see 
algorithm below), we will use them to do so. 


dW = learnktw;p,[], al, II Lp Il]) 


To prepare the weights of layer i of a custom network to learn with Learnk: 
1 Set net.trainFcnto trainr'. (net.trainParam will automatically become 
trainr's default parameters.) 


2 Setnet.adaptFcnto'trains'. (net.adaptParam will automatically become 
trains's default parameters.) 


3 Set each net.inputWeights{fi,j}.1LearnFcnto 'learnk'. Set each 
net.1layerWeights{fi,j}.1LlearnFcnto 'learnk'. (Each weight learning 
parameter property will automatically be set to Ilearnk's default 
parameters.) 


To train the network (or enable it to adapt): 
1 Set net.trainParam (or net.adaptParam) properties as desired. 


2 Call train (or adapt). 


learnk calculates the weight change dW for a given neuron from the neuron's 
input P, output A, and learning rate LR according to the Kohonen learning rule: 


dw = Lrx(p'-w), 这 a ~= 0; = 0, othervwise. 
learnis，]Jlearnos，adapt，train 


Kohonen, T., SeL 广 Orgamzzing aqQ Associatiue Memory, New York: 
Springer-Verlag, 1984. 
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PurPpose LVQ1 weight learning function 


Syntax [dWw,LS] = learnlv1(W,P,Z,N,A,T,E,gW,gA,D,LP,LS) 


info = learnlv1(code) 


Description learnlv1 is the LVQL1 weight learning fuanction. 


Learnlv1(W,P,Z,N,A,T,E,gW,gA;D,LP,LS) takes Several inputs， 


W - SxXxRweight matrix (or SX 1 bias vector). 

P - RxQinputvectors (or ones(1,Q)). 

Z - SxQweighted input vectors. 

N - SxQnetinput vectors. 

A - SxXxQoutput vectorSs. 

T - SxQlayertarget vectors. 

E - S XQlayer error Vectors. 

gwW - S XRweight gradient with respect to performance. 
gA - SXQoutput gradient with respect to performance, 
D - SxXRneuron distances. 

LP - Learning parameters, none，LP = []. 


LS -_ Learning state, initially should be = []. 


and returns， 


dW - S XRweight (or bias) change matrix. 

LS - Nevw learning state. 
Learning occurs according to learnlv1s learning parameter shown here with 
its default value. 

LP.Ir - 0.01 - Learning rate. 
Learnlv1(code) returns useful information for each code String: 


"pnames' - Names of learning parameters， 
"pdefaults' - Default learning parameters. 
needg' - Returns 1l ifthis function uses gW or gA. 
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Examples 


Neftwork Use 


Algorithm 


See Also 


Here we define a random input P, output A, weight matrix W, and output 
gradient gA for a layer with a two-element input and three neurons， 


We also define the learning rate LR. 


p= rand(2,1); 

w= rand(3,2) 

a = Compet(negdist(w;,p) ); 
g9A= [11 1]) 

lp.JFr = 0.5; 


Since learnlv1 only needs these values to calculate a weight change (see 
algorithm below), we will use them to do So. 


dW = learnlv1(wp,[]， [la,[], II,9A,[], Lp,[]) 


You can create a standard network thatuses learnlv1 with newlvdq.To prepare 
the weights of layer iof a custom network to learn with learnlv1: 


1 Set net.trainFcnto'trainr'. (net.trainParam will automatically become 
trainr's dqefault parameters.) 


2 Setnet.adaptFcnto trains'. (net.adaptParam will automatically become 
trains's default parameters.) 


3 Set each net.inputWeights{fi,j}.1LearnFcnto 'learnlv1'.Set each 
net.1layerWeights{fi,j}.1LlearnFcnto 'learnlv1'. (Each weight learning 
parameter property will automatically be set to Ilearnlv1's default 
parameters.) 


To train the network (or enable it to adapt): 
1 Set net.trainParam (or net.adaptParam) properties as desired. 


2 Call train (or adapt). 


learnlv1 calculates the weight change dWwfor a given neuron 位 om the neuron's 
input P, output A, output gradient gA and learning rate LR，according to the 
LVQ1 rule, given i the index ofthe neuron whose output a(Ii) is 1 


dw(i,:) = +lrx(p-w(ij:)) 这 gA(i) = 0;= -1Lrx(p-w(i),:)) 这 gA(i) = -1 


Learnlv2，adapt，train 
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Descripfion 


14-108 


LVQ2.1 weight learning function 


[dw,LS] = learnlv2(W,P,Z,N,A,T,E,gW,gA,D,LP,LS) 


info = learnlv2(code) 


learnlv2 is the LVQ2 weight learning function. 
learnlv2(W,P,Z;,N,A,T,E,gW,gA;D,LP,LS) takes Several inputs， 


W - SxXxRweight matrix (or SX1bias vector). 
- RXQinput vectors (or ones (1,Q)). 


X Q weighted input Vectors. 


XQnet input vectors. 


XQoutput vectors. 


oO Jo 0 


XQlayer target vectors. 

- S XQlayer error vectors. 

gW - SXRweight gradient with respect to performance, 
gA - SXQoutput gradient with respect to performance, 
D - SxSneuron distances， 

LP - Learning parameters, none, LP = []. 

LS -_ Learning state, initially should be = []. 


and returns， 


dW -SXRweight (or bias) change matTrix. 

LS - Nevw learning state. 
Learning occurs according to learnlv1's learning parameter, shown here with 
its default value. 

LP.Ir - 0.01 - Learning rate. 

LP.window - 0.25 - Windovw size (0 to 1, typically 0.2 to 0.3). 


Learnlv2(code) returns useful information for each code string: 


"pnames' - Names of learning parameters， 
"pdefaults' - Default learning parameters. 
meedg' - Returns 1 ithis fanction uses gW or gA. 


learnlv2 





Examples 


Neftwork Use 


Algorithm 


Here we define a sample input P, output A, weight matrix W, and output 
gradient gA for a layer with a two-element input and three neurons， 


We also define the learning rate LR. 


p= rand(2,1); 
W 


= rand(3,2) |; 
n= negdist(w;p); 
a = Compet(n); 
gA = [1 1 
lp.JFr = 0.5; 


Since learn1lv2 only needs these values to calculate a weight change (see 
algorithm below), we will use them to do so. 


dW = learnlv2(wp,[]sna[]，[,[],9A,[]，Ip,[I]) 


You can create a standard network that uses learnlv2 with newlvq. 
To prepare the weights of layer i of a custom network to learn with LearnlLv2: 


1 Set net.trainFcnto trainr'. (net.trainParam will automatically become 
trainrgs default parameters.) 

2 Setnet.adaptFcnto'trains'. (net.adaptParam will automatically become 
trains's default parameters.) 

3 Set each net.inputWeights{fi,j}.1LearnFcnto 'learnlv2'. Set each 
net.1layerWeights{fi,j}.1LlearnFcnto 'learnlv2'. (Each weight learning 
parameter property will automatically be set to learnlv2's default 
parameters.) 


To train the network (or enable it to adapt): 


1 Set net.trainParam (or net.adaptParam) properties as desired. 
2 Call train (or adapt). 


learnlv2 imnplements Learning Vector Quantization 2.1, which works as 
follows: 


For each presentation, ifthe winning neuron 1 should not have won, and the 
runner up j should have, and the distance di between the winning neuron and 
the input p is roughly equal to the distance dj 位 om the runner up neuron to 
the input p according to the given window， 
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min(di/dj，dj/di) > (1-window)/(1+window) 


then move the winningneuron i weights away fom the input vector, and move 
the runner up neuron j weights toward the input according to: 


dw(i:) = - lp.Lrx(p'-w(i:)) 
dw(j,:) =+ JIp.JIr*(p -wj,:)) 


See Also learnlv1，adapt，train 
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Purpose 


Synftax 


Descripfion 


Outstar weight learning function 


[dW,LS] = learnos(W,P,Z,N,A,T,E,gW,gA,D,LP,LS) 


info = learnos(code) 


learnos is the outstar weight learning function. 
learnos(W,P,Z,N,A,T,E,gW,gA,D,LP,LS) takes several inputs， 


W - SxXxRweight matrix (or SX1bias vector). 
- RXQinput vectors (or ones(1,Q)). 


XQ weighted input Vectors. 


XQnet input Vectors. 


XQ output vectorSs. 


0 OO oo 


X Qlayer target vectors. 

- S XQlayer error vectors. 

gW - SXRweight gradqient with respect to performance. 
gA - SXQoutput gradient with respect to performance, 
D - SxSneuron distances， 

LP - Learning parameters, none, LP = []. 

LS -_ Learning state, initially should be = []. 


and returnsgs 


dW - SXRweight (or bias) change matrix. 

LS - New learning state. 
Learning occurs according to learnos's learning parameter, shown here with 
its default value， 

LP.lIr - 0.01 - Learning rate， 


learnos(code) returns useful information for each code string: 


"pnames' - Names of learning parameters. 
"pdefaults' - Default learning parameters. 
meedg' - Returns 1 这 this function uses gW or gA，. 
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Examples 


Neftwork Use 


Algorithm 


See Also 


References 
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Here we define arandom input P,output A, and weight matrixWforalayer with 
a two-element input and three neurons. We also define the learning rate LR. 


p rand(2，1) 
a= rand(3,1) 
WwWw= rand(3;,2); 
lp.lJFr=0.5; 


Since learnos only needs these values to calculate a weight change (see 
algorithm below), we will use them to do So. 


dW = learnos(w;p,[], [la [ll [Lp,[]) 


To prepare the weights and the bias oflayer iofacustom network to learn with 
1earnos: 


1 Set net.trainFcnto'trainr'. (net.trainParam will automatically become 
trainr's qdqefault parameters.) 


2 Setnet.adaptFcnto'trains'. (net.adaptParam will automatically become 
trains's default parameters.) 


3 Seteach net.inputWeights{fi,j}.LearnFcnto 'learnos'. Set each 
net.1layerWeights{fi,j}.1LlearnFcnto 'learnos'. (Each weight learning 
parameter property will automatically be set to learnos's default 
parameters.) 


To train the network (or enable it to adapt): 
1 Set net.trainParam (net.adaptParam) properties to desired values. 


2 Call train (adapt). 


Learnos calculates the weight change dW for a given neuron 位 om the neuron'”s 
input P, output A, and learning rate LR according to the outstar learning rule: 


dw= 1JlLFrx(a-W)*p' 
Learnis，1JLearnk，adapt，train 


Grossberg, S., Stuaies of tpe Mina ana Brain, Drodrecht, Holland: Reidel 
Press, 1982. 
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Purpose 


Synftax 


Descripfion 


Examples 


Perceptron weight and bias learning function 


[dW,LS] 
[db,LS] 


info = learnp(code) 


learnp(W,P,Z,N,A,T,E,gW,gA;,D,LP,LS) 
Learnp(b,ones(1,Q),Z;N,A,T,E,gW,gA;,D,LP,LS) 


learnp is the perceptron weight/bias learning function. 
learnp(W,P,Z,N,A,T,E,gW,gA;,D,LP,LS) takes Several inputs， 


W - SxRweight matrix (or b, and SxX1bias vector). 
- RXQinput vectors (or ones(1,Q)). 

- S XQweighted input vectors， 
XQnet Input Vectors. 


XQ output vectorSs. 


oO OO oo 0 


X Q layer target vectors. 

- S XQlayer error vectors. 

gW - SXRweight gradqient with respect to performance. 
gA - SXQoutput gradient with respect to performance, 
D - SxSneuron distances， 

LP - Learning parameters, none, LP = []. 

LS - _ Learning state, initially should be = []. 


and returns， 


dW - SXRweight (or bias) change matrix. 
LS - New learning state. 
learnp(code) returns useful information for each code string: 
"pnames' - Names of learning parameters， 
"pdefaults' - Default learning parameters. 
meedg' - Returns 1 这 this fanction uses gW or gA， 


Here we define arandom input P and error E to a layer with a two-element 
input and three neurons， 


p= rand(2,1); 
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Neftwork Use 


Algorithm 


See Also 
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e= rand(3,1) 


Since learnp only needs these values to calculate a weight change (see 
algorithm below), we will use them to do So. 


dW = learnp([];p, [Ise CI]) 


You can create a standard network that uses Learnp with newp. 


To prepare the weights and the bias oflayer iofacustom network to learn with 
Learnp: 


1 Set net.trainFcnto 'trainb'. (net.trainParam will automatically become 
trainb's default parameters.) 


2 Setnet.adaptFcnto 'trains'. (net.adaptParam will automatically become 
trains's default parameters.) 


3 Seteach net.inputWeights{fi,j}.LearnFcnto 'learnp'.Set each 
net.1ayerWeights{i,j}.1LearnFcnto 'learnp'. Set 
net.biases{fi}.learnFcnto 'learnp'. (Each weight and bias learning 
parameter property will automatically become the empty matrix Since 
learnp has no learning parameters.) 


To train the network (or enable it to adapt): 


1 Set net.trainParam (net.adaptParam) properties to desired values. 
2 Call train (adapt). 


See newp for adaption and training examples. 


Learnp calculates the weight change dW for a given neuron 位 om the neuron2s 
input P and error E according to the perceptron learning rule: 


0 


Learnpn，newp，adapt，train 


learnp 





References Rosenblatt, 上., Przpciples of Nerxroadymamzzcs, Washington D.C.:Spartan Press， 
1961. 
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PurPpose Normalized perceptron weight and bias learning function 


Syntax [dW,LS] = learnpn(W,P,Z,N,A,T,E,gW,gA;,D,LP,LS) 


info = learnpn(code) 


Description learnpn is a weight and bias learning function. It can result in faster learning 
than learnp when input vectors have widely varying magnitudes. 


Learnpn(W,P,Z,N,A,T,E,gW,gA,D,LP,LS) takes several inputs， 


W - SxXxRweight matrix (or SX1bias vector). 
- RXQinput vectors (or ones(1,Q)). 


X Q weighted input Vectors. 


XQnet input vectors. 


XQoutput vectors. 


oO OO Jo 0 


XQlayer target vectors. 

- S XQlayer error vectors. 

gW - SXRweight gradient with respect to performance, 
gA - SXQoutput gradient with respect to performance, 
D - SxSneuron distances， 

LP - Learning parameters, none, LP = []. 

LS -_ Learning state, initially should be = []. 


and returns， 


dW - SXRweight (or bias) change matrix. 
LS - Nevw learning state. 
learnpn(code) returns useful information for each code string: 
"pnames' - Names of learning parameters. 
"pdefaults' - Default learning parameters. 
meedg' - Returns 1 ifthis fanction uses gW or gA. 


Examples Here we define arandom input P and error E to a layer with a two-element 
input and three neurons， 


p= rand(2,1); 
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Network Use 


Algorithm 


e= rand(3,1) 


Since learnpn only needs these values to calculate a weight change (see 
algorithm below), we will use them to do so. 


dW = learnpn([],p, [le ,ID) 


You can create a standard network that uses learnpn with newp. 


To preparethe weights and the bias oflayer iofacustom network to learn with 
Learnpn: 


1 Set net.trainFcnto 'trainb'. (net.trainParam will automatically become 
trainb's qdqefault parameters.) 


2 Set net.adaptFcnto trains'. (net.adaptParam will automatically become 
trains's default parameters.) 


3 Set each net.inputWeights{fi,j}.1LearnFcnto 'learnpn'.Set each 
net.1ayerWeights{I,j}.1LearnFcnto 'Iearnpn.gSet 
net.biases{fi}.1learnFcnto 'learnpn'. (Each weight and bias learning 
parameter property will automatically become the empty matrix since 
learnpn has no learning parameters.) 


To train the network (or enable it to adapt): 


1 Set net.trainParam (net.adaptParam) properties to desired values. 
2 Call train (adapt). 


See newp for adaption and training examples. 


learnpn calculates the weight change dW for a given neuron 人 fom the neuron'”s 
input P and error E according to the normalized perceptron learning rule: 


pn=py/sqrt(I1+ pl1) 2+ pl2)>2) + + PR) 2) 
dw = 0， ife= 0 
= pn'，ife = 


1 
= -pn'，ife= -| 
The expression for dW can be summarized asg: 


dw = expn' 
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Limitaftions 


See Also 
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Perceptrons do have one real limitation. The set of input vectors must be 
linearly separable ifa solution is to be found. That is, ifthe input vectors with 
targets of 1 cannot be separated by a line or hyperplane from the input vectors 
associated with values of 0, the perceptron will never be able to classify them 
COrTect1y. 


learnp，newp，adapt，train 
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Purpose 


Synftax 


Descripfion 


Self-organizing map weight learning function 


[dW,LS] = learnsom(W,P,Z,N,A,T,E,gW,gA;,D,LP,LS) 


info = learnSsom(code) 


learnsom is the self-organizing map weight learning function. 
Learnsom(W,P,Z,N,A,T,E,gW,gA;D,LP,LS) takes several inputs， 


W - SxXxRweight matrix (or SX 1 bias vector). 


XQinput vectors (or ones(1,Q)). 


X Q weighted input Vectors. 


XQnet input Vectors. 


XQ output vectorSs. 


oo 功 0 


X Qlayer target vectors. 

- S XQlayer error vectors. 

gW - SXRWweight gradqient with respect to performance. 
gA - SXQoutput gradient with respect to performance, 
D - SXSneuron distances. 

LP - Learning parameters, none, LP = []. 

LS -_ Learning state, initially should be = []. 


and returns， 


dW - SXRweight (or bias) change matrix. 
LS - New learning state. 


Learning occurs according to learnsoms learning parameter, shown here with 
its default value， 


LP.order_lLr 0.9 Ordering phase learning rate. 
LP.order_steps 1000 “Ordering phase steps. 

LP.tune_lLr 0.02 “Tuning phase learning rate. 
LP.tune_nd 1 ， Tuning phase neighborhood distance. 
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Examples 


Neftwork Use 
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learnpn(code) returns useful information for each code string: 


"pnames' - Names of learning parameters， 
"pdefaults' - Default learning parameters. 
meedg' - Returns 1 这 this fanction uses gW or gA. 


Here we define arandom input P, output A, and weight matrix W, for a layer 
with a two-element input and six neurons. We also calculate positions and 
distances for the neurons, which are arranged in a 2-by-3 hexagonal pattern. 
Then we define the four learning parameters， 


p= rand(2,1); 
a= rand(6)1 


) 
) 
wW= rand(6,2) 
人 
人 


了 


pos = hextop(2,3) ; 
d = Jinkdist(pos ) ; 
lp.order lIFr = 0.9; 
lp.order_steps = 1000 
Lp.tune_lr = 0.02; 
Lp.tune_nd = 1 


了 


2 
p 


Since learnsom only needs these values to calculate a weight change (see 
algorithm below), we will use them to do So. 

JS = []; 

[dW,1s] = learnsom(w,p,[],[],a,[],[],[],[],d,1Lp,1IS) 


You can create a standard network that uses learnsom with newsom， 


1 Set net.trainFcnto'trainr'. (net.trainParam will automatically become 
trainr's qdqefault parameters.) 

2 Setnet.adaptFcnto'trains'. (net.adaptParam will automatically become 
trains's default parameters.) 

3 Set each net.inputWeights{fi,j}.1LearnFcnto 'learnsom.gSet each 
net.1ayerWeights{i,j}.1LearnFcnto ' learnsom. Set 
net.biases{fi}.learnFcnto 'learnsom'. (Each weight learning parameter 
property will automatically be set to learnsom's default parameters.) 


To train the network (or enable it to adapt): 


1 Set net.trainParam (net.adaptParam) properties to desired values. 


learnsom 





2 Call train (adapt). 


Algorithm Learnsom calculates the weight change dWwfor a given neuron 位 om the neuron's 
input P, activation A2, and learning rate LR: 


dw= 1JLrxa2x(p -WwW) 


where the activation A2 is found from the layer output A and neuron distances 
D and the current neighborhood size ND: 


a2(1;,q) = 1， if a(i,q) = 1 
= 0.5，if a(j,q) =1 and Di,j) <= nd 
= 0， otherwise 


The learning rate LR and neighborhood size NS are altered through two phases: 
an ordering phase and a tuning phase. 


The ordering phases lasts as many steps as LP.order_steps. During this 
phaseLRis adqjusted from LP.order _LrdowntoLP.tune_lr,andNDis aqjusted 
位 om the maximum neuron distance down to 1. It is during this phase that 
neuron weights are expected to order themselves in the input space consistent 
with the associated neuron positions. 


During thetuningphaseLRdecreases Slowly from LP.tune_lLrand NDis always 
set to LP.tune_nd. During this phase the weights are expected to spread out 
relatively evenly over the input space while retaining their topological order 
found during the ordering phase. 


See Also adapt，train 
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PurPpose 


Syntax 


Descripfion 
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Widrow-Hoff weight/bias learning function 


[dW,LS] = Learnwh(W,P,Z,N,A,T,E,gW， 
[db,LS] = Learnwh(b,ones(1,Q),Z,N,A 
info = learnwh(code) 


gA,D,LP,LS) 
5T,E,gW,gA,D,LP,LS) 


learnwh is the Widrow-Hoff weight/bias learning function, and is also knowmn 
as the delta or least mean squared (LMS) rule. 


Learnwh(W,P,Z,N,A,T,E,gW,gA,D,LP,LS) takes several inputs， 


W - SxXxRweight matrix (or b, and SX1bias vector). 


XQinput vectors (or ones(1,Q)). 
- S XQweighted input vectors， 
XQnet Input vectors. 


XQoutput vectors. 


1 
OO oO om 


XQlayer target vectors. 

- S XQlayer error vectors. 

gW - SXRweight gradient with respect to performance, 
gA - SXQoutput gradient with respect to performance, 
D - SxSneuron distances， 

LP - Learning parameters, none, LP = []. 

LS -_ Learning state, initially should be = []. 


and returns， 


dW - S XRweight (or bias) change matrix. 

LS - Nevw learning state. 
Learning occurs according to learnwh's learning parameter shown here with 
its default value. 

LP.Ir - 0.01 - Learning rate. 
learnwh(code) returns useful information for each code string: 


"pnames' - Names of learning parameters， 
"pdefaults' - Default learning parameters. 
meedg' - Returns 1 这 this fanction uses gW or gA. 


learnwh 





Examples 


Neftwork Use 


Algorithm 


Here we define arandom input P and error E to a layer with a two-element 
input and three neurons. We also define the learning rate LR learning 
parameter. 


p= rand(2,1); 
e= rand(3,1) 
lp.JFr = 0.5; 


Since learnwh only needs these values to calculate a weight change (see 
algorithm below), we will use them to do so. 


dW = learnwh([]j,p, [le Lp [II]) 


You can create a standard network that uses learnwh with new1lin. 


To preparethe weights and the bias oflayer iofacustom network to learn with 
earnwh: 


1 Set net.trainFcnto trainb'. net.trainParam will automatically become 
trainb's default parameters, 


2 Setnet.adaptFcnto 'trains'. net.adaptParam will automatically become 
trains's default parameters, 


3 Set each net.inputWeights{fi,j}.1LearnFcnto 'learnwh'. Set each 
net.1layerWeights{fi,j}.1LlearnFcnto 'learnwh'. Set 
net.biases{fi}.1LlearnFcnto 'Learnwh'. 


卫 ach weight and bias learning parameter property will automatically be set to 
Learnwh's default parameters. 


To train the network (or enable it to adapt): 


1 Set net.trainParam (net.adaptParam) properties to desired values. 
2 Call train(adapt ). 


See newlin for adaption and training examples. 


learnwh calculates the weight change dW for a given neuron 位 om the neuron's 
input P and erTror E, and the weight (or bias) learning rate LR, according to the 
Widrow-Hoff learning rule: 


dw = LIrxexpn' 
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See Also newlin，adapt，train 


References Widrow, B., and M. 卫 . Ho 华 “Adaptive switching circuits,”7960 7RR 玉 WESCON 
CoPpvePtizomn RecorQ, New YorR 71 民 玉 pp. 96-104, 1960. 


Widrow B. and S. D. Sterns, Aaaptzve Sis7mal Processi1s, New York': 
Prentice-Hall, 1985. 
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Purpose Link distance function 
Synftax d = Linkdist(pos) 
Descripfion 1Linkdist is a layer distance function used to find the distances between the 


layers neurongs given their positions. 
Linkdist(pos) takes one argument， 


pos - N XSmatrix ofneuron positions. 
and returns the SXS matrix of distances. 


Examples Here we define arandom matrix ofpositiongs for 10 neurons arranged in three- 
dimensional space and find their distances. 


pos = rand(3,10) ; 
D = 1Linkdist(pos) 


Nefwork Use You can create a standard network that uses Linkdist as a distance function 
by calling newsom， 


To change a network so that a layer's topology uses Linkdist, set 
net.1Layers{i}.distanceFcnto 'Linkdist'. 


In either case, call sim to simulate the network with dist. See newsom for 
training and adaption examples. 


Algorithm The link distance D between two position vectors Pi and Pj from a set of S 
Vectors 18: 


人 证 = 0， 计 1i==j 

= 1, 计 (sum((Pi-Pj).^2)).^0.5 is <= 
2, 诈 k exists，Dik = Dkj = 1 

= 3, 让 k1，k2 exist，Dik1 = Dk1k2 = Dk2j = 1 
= N, 让 k1..KkN exist，Dik1 = Dk1k2 = 
S,， 计 none of the above conditiongs 6 


1 
已 
六 
到 

呈 . 

中 

一 


See Also sim，dist，mandist 
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Purpose Log sigmoid transfer function 
Graph and 
Symbol 电 





QG= logsig(1) 
Log-Sigmoid Transfer Function 


Syntax A = 1ogsig(N) 
info = 1ogsig(code) 

Description 1ogsigis atransferfunction.Transfer functions calculate alayers output 位 om 
its net input. 
1ogsig(N) takes one input， 


N - SXQmatrix ofnet input (column) vectors. 
and returns each element of N squashed between 0 and |. 
1ogsig(code) returns useful information for each code string: 
deriv -Name ofderivative function. 
mame'” -了 ull name. 
"output' - Output range. 
"active' - Active input range. 


Examples Here is the code to create a plot ofthe 1ogsig transfer fanction. 


n= -5:0.1:5; 
a = 1ogsigln) ; 
plot(ny,al) 


Network Use You can create a standard network that uses 1ogsig by calling newff or newcf. 
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Algorithm 


See Also 


To change anetwork so alayer uses 1ogsig, set net.1layers{fi}.transferFcn 


to '1ogsig'. 


In either case, call simto simulate the network with purelin， 


See newff or newcf for simulation examples. 
10ogsigln) =1/ (1+exp(-n)) 


Sim，dlogsig，tansig 
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PurPpose Mean absolute error performance function 


Syntax perf = mae(E,X,PP) 


perf = mae(E,net,PP) 


info = mae(code) 


Description mae is a network performance function. 
mae (E,X,PP) takes 位 om one to three arguments， 


E - Matrix or cell array of error vector(S). 
X  - Vector of all weight and bias values (ignored). 
PP - _ Performance parameters (ignored). 


and returns the mean absolute erTor. 
The errors E can be given in cell array form， 


E - NtXTScell array, each element E{fi,tsyig8aVvi x Qmatrixor[]. 


Or as a matrixX， 


E - (sum of Vi)XQmatriX 


where 


Nt = net.numTargets 

TS = Number oftime steps 

Q = 了 Batch size 

ViI=net.targets{i}.Ssize 
mae(E,net,PP) can take an alternate argument to X， 

net - Neural network from which X can be obtained (ignored). 
mae (code) returns useful information for each code string: 

"deriv - Name of derivative fanction. 

mame'”- Full name. 

"pnames' - Names oftraining parameters. 

"pdefaults' - Default training parameters， 
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Examples 


Neftwork Use 


See Also 


Here a perceptron is created with a 1-element inputranging 位 om -10to 10, and 
one neuron. 


net = newp([-10 10] ,1); 
Here the network is given a batch of inputs P. The error is calculated by 


subtracting the output A 位 om target T. Then the mean absolute erTror is 
calculated. 


p= [-10.-50510]; 
t= [00111]; 

y = Sim(net,p) 

e = t-y 


perf = mae(e) 


Note that mae can be called with only one argument because the other 
arguments are jignored. mae Supports those arguments to conform to the 
standard performance function argument list. 


You can create a standard network that uses mae with newp. 


To prepare a custom network to be trained with mae, set net.performFcn to 
mae'. This will automatically set net.performParam to the empty matrix [], as 
mae has no performance parameters. 


In either case, calling train or adapt will result in mae being used to calculate 
performance. 


See newp for examples. 


mse，msereg，dmae 
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PurPpose Manhattan distance weight function 


Syntax Z = mandist(W,P) 
df = mandist( deriv ') 
D = mandist(pos) ; 


Description mandist is the Manhattan distance weight function. Weight functions apply 
weights to an input to get weighted inputs. 


mandist(W,P) takes these inputs， 


W - S XRweight matrix. 
P - RXQmatrzxofaQinput (column) vectors. 
and returns the S XQmatrix of vector distances，. 


mandist('deriv') returns '' because mandist does not have a derivative 
fanction. 


mandist is also a layer distance function, which can be used to find the 
distances between neurons in a layer. 


mandist(pos) takes one argument， 


pos - An S row matrix ofneuron positions， 


and returns the SXSmatrix of distances. 


Examples Here we define arandom weight matrix Wand input vector Pand calculate the 
corTresponding weighted input Z. 


覃 


W= rand(4;,3) 
P= rand(3,1) 
Z= mandist(W,P) 


Here we define arandom matrix ofpositiongs for 10 neurons arranged in three- 
dimensional space and then find their distances， 


pos = rand(3,10) ; 
D = mandist(pos) 


Network Use You can create a standard network that uses mandist as adistance function by 
calling newsom. 


14-130 


mandist 





To change a network so an input weight uses mandist, set 
net.inputWeight{fi,j}y.weightFcnto mandist .For a layer weight, set 
net.inputWeight{Ii,j}.weightFcnto mandist'. 


To change a network so a layers topology uses mandist, set 
net.1Layers{i}.distanceFcnto mandist'. 


In either case, call simto simulate the network with dist. See newpnn or 
newgrnn for simulation examples. 


Algorithm The Manhattan distance D between two vectors X and Y il8: 


D = Sum(abs(Xx-y)) 


See Also sim，dist，1Linkdist 
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PurPpose 


Syntax 


Descripfion 


Examples 


See Also 
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Maximum learning rate for a linear layer 


LI = maxlinlLr(P) 
1 maxlinlr(P，bias ' ) 


maxlinlr is used to calculate learning rates for newlin. 
maxlLinlLr(P) takes one argument， 


P - RXQmatrxofinput vectors. 
and returns the maximum learning rate for a linear layer without a bias that 
is to be trained only on the vectors in P. 


maxlinlr(P,'bias') returns the maximum learning rate for a linear layer 
with a bias. 
Here we define a batch of four two-element input vectors and find the 
maximum learning rate for a linear layer with a bias. 

P= [12-47;0.1310 6]; 

LIFr = maxlinlr(P，bias ' ) 


1innet，newlin，newlind 
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Purpose 
Syntax 


Descripfion 


Examples 


Network Use 


See Also 


Midpoint weight initialization function 

W=midpoint(S,PR) 

midpoint is a weight initialization function that sets weight (Tow) vectors to 
the center ofthe input ranges. 


midpoint(S,PR) takes two arguments， 


S - Number ofrows (neurons). 
PR - RX2matrxofinput valueranges = [Pmin Pmax ]. 
and returns an SXRmatrix with rowsgs set to (Pmin+Pmax) ' /2. 


Here initial weight values are calculated for a 5 neuron layer with input 
elements ranging over [0 1] and [-2 2]. 

W=midpoint(5,[0 1; -2 2]) 
You can create a standard network that uses midpoint to initialize weights by 
calling newc. 


To prepare the weights and the bias oflayer i of a custom network to initialize 
with midpoint: 


1 Set net.initFcn to 'initlay'. (net.initParam will automatically become 
initlay's default parameters.) 


2 Set net.1ayers{fi}.initFcnto 'initwb'. 


3 Set each net.inputWeights{fi,j}+.initFcnto midpoint'.Set each 
net.1ayerWeights{I,j}+.initFcnto midpoint'; 


To initialize the network call init. 


initwb，initlay，init 
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PurPpose Ranges of matrix rows 
Syntax pr = minmax(p) 
Description minmax(P) takes one argument， 


P - R XQmatrix. 


andreturns theRx2matrx PRofminimum andmaximum values for each row 
of P. 


Examples P= [012; -1-2 -0.5] 
pr = minmax(P) 
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Purpose 


Synftax 


Descripfion 


Examples 


Mean squared error performance fanction 


perf = mse(E,X,PP) 
perf = mse(E,net,PP) 


info = mse(code) 


mse is anetwork performance fanction. It measures the network's performance 
according to the mean of squared erTors. 


mse(E,X,PP) takes 人 om one to three arguments， 


E _ - Matrix or cell array of error vector(S). 
X  - Vector of all weight and bias values (ignored). 
PP - _ Performance parameters (ignored). 


and returns the mean Squared erTorT. 
mse(E,net,PP) can take an alternate argument to X， 


net - Neural network 他 om which Xx can be obtained (ignored). 
mse(code) returns useful information for each code string: 


deriv - Name of derivative function. 
mame' - Full name. 

"pnames' - Names of training parameters. 
"pdefaults' - Default training parameters. 


Here a two-layer feed-forward network is created with a 1-element input 
ranging 位 om -10 to 10, four hidden tansig neurons, and one purelin output 
neuron . 


net = newff([-10 10],[4 1],{ tansig ，purelin ' }); 


Here the network is given a batch of inputs P. The error is calculated by 
Subtracting the output A 位 om target T. Then the mean squared error is 
calculated. 


p= [-10.-50510]; 
t= [00111]; 

y = Sim(net,p) 

e = t-y 
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Neftwork Use 


See Also 
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perf = mse(e) 


Note that mse can be called with only one argument because the other 
arguments are ijgnored. mse Supports those ignored arguments to conform to 
the standard performance function argument list. 


You can create a standard network that uses mse with newff, newcf, or newelLm. 


To prepare a custom network to be trained with mse, set net.performFcn to 
mse' .This will automatically set net.performParam to the empty matrix [ ]， 
as mse has no performance parameters, 


In either case, calling train or adapt will result in mse being used to calculate 
performance.， 


See newff or newcf for examples. 


msereg，mae，dmse 


msereg 





Purpose 


Synftax 


Descripfion 


Mean squared error with regularization performance fanction 


perf = msereg(E,X,PP) 
perf = msereg(E,net,PP) 


Info msereg(code ) 


msereg is a network performance function. It measures network performance 
as the weight sum oftwo factors: the mean squared error and the mean 
squared weight and bias values. 


msereg(E,X,PP) takes 位 om three arguments， 


E  - Matrix or cell array of error vector(S). 
X - Vector ofall weight and bias values. 
PP - _ Performance parameter. 

where PP defines one performance parameters， 


PP.ratio - Relative imnportance of errors vsS. weight and bias values. 


and returns the sum ofmean squared errors (times PP.ratio) with the mean 
squared weight and bias values (times 1-PP.ratio). 


The errors E can be given in cell array form， 

E - Nt XTScell array, each element E{fi,tsyis anVixQmatrxor []. 
Or as a IatrixX， 

E - (Sum ofVi)XQmatrix 
where 


Nt = net.numTargets 
TS = Number oftime steps 
Q = Batch Size 
Vi = net.targets{fi}.sSize 
msereg(E,net) takes an alternate argument to X and PP， 


net - Neural network 他 om which X and PP can be obtained. 
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Examples 


Neftwork Use 


See Also 
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msereg(code) returns useful information for each code string: 


"deriv - Name of derivative function， 
mame'”- 了 Full name， 

"pnames' - Names of training parameters. 
"pdefaults' - Default training parameters. 


Here a two-layer feed-forward is created with a one-element input ranging 
位 om -2 to 2, four hidden tansig neurons, and one purelin output neuron. 


net = newff([-2 2],[4 1] 
{ tansig ，purelin'}，trainlm' ， 1Learngdm' ，msereg  ) ; 


Here the network is given a batch of inputs P. The error is calculated by 
subtracting the output A 位 om target T. Then the mean squared error is 
calculated using a ratio of 20/(20+1). (Errors are 20 times as important as 
weight and bias values). 


p [-2 -1012]; 

t= [01110]; 

yY = Sim(net,p) 

e = 七 -y 

net.performParam.ratio = 20/(20+1) ; 
perf = msereg(e;net) 


You can create a standard network that uses msereg with newff, newcf, OF 
newelnm. 


To prepare a custom network to be trained with msereg, set net.performFcn to 
msereg'. This will automatically set net.performParam to msereg's default 
performance parameters, 


In either case, calling train or adapt will result in msereg being used to 
calculate performance. 


See newff or newcf for examples. 


mse，mae，dmsereg 
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Purpose 


Synftax 


Descripfion 


Examples 


Network Use 


Algorithm 


See Also 


Negative distance weight function 


Z= negdist(W,P) 
df = negdist( deriv ') 
negdist is a weight function. Weight functions apply weights to an inputto get 
weighted inputs. 
negdist(W,P) takes these inputs， 
W - S XRweight matrix. 


P - RXQmatrzxofaQinput (column) vectors. 
and returns theS xXQmatrix ofnegative vector distances. 
negdist('deriv') returns '' because negdist does not have a derivative 
fanction,， 
Here we define arandom weight matrix Wand input vector P and calculate the 
corresponding weighted input Z. 


W= rand(4,3) 
P = rand(3,1); 
Z= negdist(W,P) 


You can create a standard network that uses negdist by calling newc or 
newsom. 


To change a network so an input weight uses negdist, set 
net.inputWeight{fi, jy.weightFcnto 'negdist .For a layer weight set 
net.inputWeight{Ii,j}.weightFcnto "negdist . 


In either case, call simto simulate the network with negdist. See newc or 
newsom for simulation examples. 


negdist returns the negative Euclidean distance': 


Z= -Sdqrt(Ssum(w-p)^2) 


Slim，dotprod，dist 
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PurPpose 


Syntax 


Descripfion 


Examples 


Neftwork Use 


See Also 
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Product net input function 


N = netprod(Z1,Z2,...，Zn) 
df = netprod( deriv ') 
netprod is anet input function. Net input functions calculate a layers net 
input by combining its weighted inputs and biases. 
netprod(Z1,Z2,...,Zn) takes， 

Zi - SXQmatrices. 
and returns an element-wise Sum of Zi7s， 


netprod('deriv') returns netprod's derivative fuanction. 


Here netprod combines two sets of weighted input vectors (which we have 
defined ourselves). 


zl1= [124;341]; 
zZ2= [-122j -5 -6 1]; 
n = netprod(Z1,Z2) 


Here netprod combines the same weighted inputs with a bias vector. Because 
Z1 and Z2 each contain three concurrent vectors, three concurrent copies of B 
must be created with concur so that all sizes match up. 

b = [0; -1]， 

n = netprod(z1,z2,concur(b,3)) 


You can create a standard network that uses netprod by calling newpnn or 
newgrnn. 


To change anetwork so that a layer uses netprod, set 
net.1Layers{i}y.netInputFcn to "netprod . 


In either case, call sim to simulate the network with netprod. See newpnn or 
newgrnn for simulation examples. 


Simn，dnetprod，netSsum，concur 
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Purpose 


Synftax 


Descripfion 


Examples 


Neftwork Use 


See Also 


Sum net input function 


N = netSsum(Z1,Z2,...，Zn) 
df = netSsum( ' deriv ) 
netsumis anetinput function.Net input functions calculate a layers net input 
by combining its weighted inputs and biases. 
netsum(Z1,Z2,...,zZn) takes any number ofinputs， 
Zi - SxXQmatrices， 
and returns N, the element-wise sum of Zi7s， 


netsum( 'deriv') returns netsums derivative function. 


Here netsum combines two sets of weighted input vectors (which we have 
defined ourselves). 


zl1= [124;341]; 
zZ2= [-122; -5 -6 1]; 
n = netsum(Z1,Z2) 


Here netsum combines the same weighted inputs with a bias vector. Because 
Z1 and Z2 each contain three concurrent vectors, three concurrent copies of B 
must be created with concur so that all sizes match up. 


b = [0; -1]; 
n netSsum(z1,z2,concur(b,3)) 


You can create a standard network that uses netsum by calling newp or newlin，. 


To change anetwork so alayer uses netsum, set net.1ayers{fi}.netInputFcn 
to netsum'. 


In either case, call sim to simulate the network with netsum. See newp or 
newlin for simulation examples. 


Slimn，dnetprod，netprod，concur 
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PurPpose Create a custom neural network 


Syntax net = network 


net = network(numInputs ,numLayers,biasCconnect ,inputConnect， 
1ayerConnect ,outputConnect ,targetConnect ) 


To Gef Help Type help network/network 


Description network creates new custom networks. It is used to create networks that are 
then customized by fanctions such as newp, newlLin, newff, etc. 


network takes these optional arguments (Shown with default values): 


numInputs - Number of inputs, 0. 

numLayers - Number of layers, 0. 

biasConnect “- numLayers-by-1 Boolean vector, zeros. 
inputConnect  - numLayers-by-numInputs Boolean matrix, Zeros. 
layerConnect  - numLayers-by-numLayers Boolean matrix, Zeros. 


outputCconnect - 1-by-numLayers Boolean vector, Zeros. 
targetConnect - 1-by-numLayers Boolean vector, Zeros. 


and returns， 


net - New network with the given property values. 
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Properties Architecture properties: 


net.numInputs: 0 or a positive integer. 
Number of inputs. 
net .numLayers: 0 or a positive integer. 
Number of layers. 
net.biasConnect: numLayer-by-1 Boolean vector. 


Ifnet.biasConnect(I) is 1,thenthelayeriphasabias and net.biases{fil} 
is a structure describing that bias. 


net.inputConnect: numLayer-by-numInputs Boolean vector. 


Ifnet.inputCconnect(i,j) is 1,then layer 1i has a weight coming 位 om 
input j and net.inputWeights{i,jlisastructure describing that weight. 


net.1layerCconnect: numLayer-by-numLayers Boolean vector. 


Ifnet.1layerCconnect(i,j) is 1,then layer 1i has a weight coming 位 om 
layerj and net.1layerWeights{fi,jlis astructure describing that weight. 


net.outputConnect: 1-by-numLayers Boolean vector. 


Ifnet.outputConnect(i) is 1 then the network has an output fom layer 
iand net.outputs{filis a structure describing that output. 


net.targetConnect: 1-by-numLayers Boolean vector. 


Ifnet.outputConnect(i) is 1,then the network has atarget 位 om layer ii 
and net.targets{filyis astructure describing that target. 


net.numoutputs: 0 or a positive integer. Read only. 

Number of network outputs according to net.outputConnect. 
net.numTargets: 0 or a positive integer. Read only. 

Number of targets according to net.targetConnect. 
net.numInputDelays: 0 or a positive integer. Read only, 

Maximum input delay according to all net.inputWeight{i,j}.delays. 
net.numLayerDelays: 0 or a positive number. Read only. 





Maximum layer delay according to all net.layerWeight{i,j}.delays. 
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Subobject structure properties: 


net.inputs: numInputs-by-l cell array. 
net.inputs{fi}yis astructure defining input 1 

net.1Layers: numLayers-by-l cell array. 
net.layers{fi}yis astructure defining layer 1 

net.biases: numLayers-by-l cell array. 


Ifnet.biasConnect(Ii) is 1, then net.biases{fil is astructure defining 
the bias for layer i. 


net.inputWeights: numLayers-by-numInputs cell array. 


Ifnet.inputCconnect(i,j) is 1,then net.inputWeights{fi,j}yisa 
structure defining the weightto layer ifrom input j. 


net.1layerWeights: numLayers-by-numLayers cell array. 


Ifnet.1LayerConnect(i,j) is 1,then net.1LayerWeights{fi,j}yisa 
structure defining the weight to layer 1 from layer j. 


net.outputs: 1-by-numLayers cell array. 


Ifnet.outputConnect(i) is 1, then net.outputs{fily is astructure 
defining the network output ffom layer 工 , 


net.targets: 1-by-numLayers cell array. 


Ifnet.targetConnect(i) is 1, then net.targets{filyis astructure 
defining the network target to layer 工 . 


Eunction properties: 


net.adaptFcn: name of a network adaption function or ' '. 

net.initFcn:name ofa network initialization fanction or ' '. 

net.performFcn: name of a network performance function or ' . 

net.trainFcn: name of a network training function or ' '. 
Parameter properties: 

net.adaptParam: network adaption parameters， 

net.initParam: network initialization parameters. 

net.performParam: network performance parameters, 

net.trainParam: network training parameters. 
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Examples 


Weight and bias value properties: 


net.IW: numLayers-by-numInputs cell array of input weight values. 
net.LW: numLayers-by-numLayers cell array of layer weight values. 
net.b: numLayers-by-l cell array of bias values. 


Other properties: 

net.userdata: Structure you can use to Store useful values. 
Here is the code to create a network without any inputs and layers, and then 
set its number of inputs and layer to 1 and 2 respectively. 


net = network 
net .numInputs 1 
net.numLayers = 2 


Here is the code to create the same network with one line of code. 

net = network(1,2) 
Here is the code to create a 1 input, 2 layer, feed-forward network. Only the 
first layer will have a bias. An input weight will connect to layer 1 from input 


1.Alayer weight will connect to layer 2 位 om layer 1. Layer 2 will be a network 
output, and have a target. 


net = network(1,2,[1;0],[1; 0],[0 0; 1 0],[0 1],[0 1]) 
We can then see the properties of subobjects as follows: 


net.inputs{1} 

net.1ayers{1}，net.1Layers{21} 

net.biases{1} 

net.inputWeights{1,1}，net. LayerWeights{2,11} 
net.outputs{21} 

net.targets{21} 


We can get the weight matrices and bias vector as followsgs: 
net.iw.{1,1}+，net.iw{2,1}+，net.b{f1} 


We can alter the properties of any ofthese subobjects. Here we change the 
transfer fanctions of both layers: 
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net.1ayers{1}.transferFcn = tansig ' ; 
net. Layers{2}.transferFcn = '10ogsig ; 


Here we change the number ofelements in input 1 to 2, by setting each 
element's range: 


net.inputs{1}y.range = [01; -1 1]; 
Next we can Simulate the network for a two-element input vector: 


p= [0.5; -0.1]; 
yY = Sim(net,p) 


See Also sinm 


14-146 


me 人 < 





Purpose 


Syntax 


Descripfion 


Properties 


Examples 


Create a competitive layer 


net = newcC 
net = newc(PR,S,KLR,CLR) 


Competitive layers are used to solve classification problems. 
net = newc creates a new network with a dialog box. 
net = newc(PR,S,KLR,CLR) takes these inputs， 


PR - RX2matrzxofmin andmaxvalues for Rinput elements. 
S - Number ofneurons. 

KLR - 玫 ohonen learning rate, default = 0.01. 

CLR - Conscience learning rate, default = 0.001. 


and returns a new competitive layer. 

Competitive layers consist of a single layer with the negdist weight fanction， 
netsum net input fanction, and the compet trangsfer function. 

The layer has a weight from the input, and a bias. 

Weights and biases are initialized with midpoint and initcon. 

Adaption and training are done with trains and trainr, which both update 
weight and bias values with the learnk and learncon learning functions. 
Here is a set of four two-element vectors P. 


P=[.1.8 .1.9; .2.9.1.8]; 


A competitive layer can be used to divide these inputs into two classes. First a 
two neuron layer is created with two input elements ranging 人 om 0 to 1, then 
it is trained. 


net = newc([0 1; 0 1]，,2) 
net = train(net,P) ; 


The resulting network can then be simulated and its output vectors converted 
to class indices. 


Y=Sim(net,P) 
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YC = vec2ind(Y) 


See Also Slim，init，adapt，train，trains，trainr，newcf 
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Purpose 


Synftax 


Descripfion 


Create a trainable cascade-forward backpropagation network 


net = newcf 


net = newcf(PR,[S1 S2...SNL1],{TF1 TF2...TFN1L},BTF,BLF,PF) 


net = newcf creates a new network with a dialog box. 
newcf(PR,[S1 S2...SN1L],{TF1 TF2...TFN1L},BTF,BLF,PF) takes， 


PR - RX2matrzxofmin andmaxvalues for Rinput elements. 
Si - Size ofith layer, for N1 layers. 
TFIi -_ Transfer function of ith layer, default = 'tansig'. 
BTF - Backpropagation network training fanction, default = 'traingd . 
BLF - Backpropagation weight/bias learning function, default = ,learngdm'. 
PF “ - _ Performance function, default = "mse'. 
and returns an N layer cascade-forward backprop networkK. 


The transfer functions TFi can be any differentiable transfer function such as 
tansig, 1o0gsig, or purelin. 


The training function BTF can be any ofthe backprop training functions such 
as trainlm, trainbfg, trainrp, traingd, etc. 





Caution: trainlm is the default training function because it is very fast, but 让 
requires a lot of memory to run， If you get an“out-of-memory”error when 
training try doing one ofthese: 





1 Slow trainlm training, but reduce memory requirements by setting 
net.trainParam.mem_reduc to 2 or more. (See help trainlm.) 


2 Use trainbfg, which is slower but more memory-efEGcient than train1lm. 
3 Use trainrp, which is slower but more memory-efficient than trainbfg. 


The learning function BLF can be either ofthe backpropagation learning 
fanctions such as learngd or learngdm. 


The performance function can be any ofthe differentiable performance 
fanctions such as mse or msereg. 
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Examples 


Algorithm 


See Also 
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Here is a problem consisting of inputs P and targets Tthat we would like to 
solve with a network. 


P= [012314 


5 0] ; 
T=[012343 


678 1 
2 12 4]; 


9 
3 
Here a two-layer cascade-forward network is created. The network's input 


ranges 他 om [0 to 10]. The first layer has five tansigmneurons, the second layer 
has one purelLin neuron. The trainlm network training function is to be used. 


net = newcf([0 10],[5 1],{f tansig purelin' }); 


Here the network is Simulated and its output plotted against the targets. 


Y=Sim(net,P); 
plot(P,T,P,Y，0 ') 


Here the network is trained for 50 epochs. Again the network's output is 
plotted. 


net.trainParam.epochs = 50 
net = train(net,P,T) 
Y=Sim(net,P); 
plot(P,T,P,Y，0 ') 


Cascade-forward networks consist of N1 layers using the dotprod weight 
fanction, netsum net input fanction, and the specified transfer functions. 


The first layer has weights coming from the input. 也 ach subsequent layer has 
weights coming 位 om the input and all previous layers. All layers have biases. 
The last layer is the network outpnut. 


卫 ach layers weights and biases are initialized with initnw. 


Adaption is done with trains, which updates weights with the specified 
learning function. Training is done with the specified training function. 
Performance is measured according to the specified performance function. 


newff, newelm, Sim, Init, adapt, train, trains 
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Purpose 


Synftax 


Descripfion 


Create an Elman backpropagation network 


net = newelnm 


net = newelm(PR,[S1 S2...SN1],{TF1 TF2...TFNL}+,BTF,BLF,PF) 


net = newelm creates a new network with a dialog box. 


newelm(PR, [S1 S2...SN1],{TF1 TF2...TFN1},BTF,BLF,PF) takes several 
arguments， 
PR - RX2matrzxofmin and maxvalues for Rinput elements. 
Si - Size ofith layer, for N1 layers. 
TFI - _ Transfer function of ith layer, default = 'tansig'. 
BTF - Backpropagation network training function, default = 'traingdx'. 
BLF - Backpropagation weight/bias learning function, default = ,learngdm'. 
PF “ - _ Performance function, default = "mse'. 
and returns an 上]man network. 


The training function BTF can be any ofthe backprop training functions such 
as trainlm, trainbfg, trainrp, traingd, etc. 





Cauftion: trainlmis the default training function because it is very fast, but 让 
requires a lot of memory to run. Ifyou get an “out-of-memory”error when 
training try doing one ofthese: 





1 Slow trainlnm training, but reduce memory requirements by setting 
net.trainParam.mem_reduc to 2 or more. (See help trainlm.) 


2 Use trainbfg, which is slower but more memory-e 值 cient than trainlm. 
3 Use trainrp, which is slower but more memory-efficient than trainbfg. 


The learning function BLF can be either ofthe backpropagation learning 
fanctions such as learngd or learngdm. 


The performance function can be any ofthe differentiable performance 
functions such as mse or msereg. 
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Examples 


Algorithm 


See Also 
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Here is a series of Boolean inputs P, and another sequence T, which is 1IL 
wherever P has had two ls in a Tow. 


P = round(rand(1,20) ) ; 
T= [0 (P(1:end-1)+P(2:end) == 2)]; 


We would like the network to recognize whenever two ls occur in arow. First 
we arrange these values as sequences. 


Pseq = con2seq(P) ; 
Tseq = con2seq(T) ; 


Next we create an Elman network whose input varies fom 0 to 1, and has five 
hidden neurons and 1 output. 


net = newelm([0 1],[10 1],{ 人 tansig ，1ogsig }) 


Then we train the network with a mean squared error goal of 0.1, and simulate 
让 . 


net = train(net,Pseq,Tsedq) ; 
Y = Sim(net,Pseqd) 
了 上 Iman networks consist ofN1 layers using the dotprod weightfunction, netsum 


net input function, and the specified transfer fanctions. 


The first layer has weights coming from the input. 卫 ach subsequent layer has 
a Weight coming 位 om the previous layer. All layers except the last have a 
recurrent weight. All layers have biases. The last layer is the network outpnut. 


也 ach layers weights and biases are initialized with initnw. 


Adaption is done with trains, which updates weights with the specified 
learning function. Training is done with the specified training function. 
Performance is measured according to the specified performance function. 


newff ，newcf，Sim，init，adapt，train，trains 
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Purpose 


Synftax 


Descripfion 


Create a feed-forward backpropagation network 


net = newff 
net = newff(PR,[S1 S2...SN1L],{TF1 TF2...TFNL} ,BTF,BLF,PF) 


net = newff creates a new network with a dqialog box. 
newff(PR,[S1 S2...SNL],{TF1 TF2...TFNL},BTF,BLF,PF) takes， 


PR - RX2matrzxofmin and maxvalues for Rinput elements. 
Si - Size ofith layer, for N1 layers. 
TFIi -_ Transfer function of ith layer, default = 'tansig'. 
BTF - Backpropagation network training fanction, default = 'traingdx'. 
BLF - Backpropagation weight/bias learning function, default = ,learngdm'. 
PF “ - _ Performance function, default = "mse'. 

and returns an N layer feed-forward backprop network. 


The transfer functions TFi can be any differentiable transfer function such as 
tansig, 1o0gsig, or purelin. 


The training function BTF can be any ofthe backprop training functions such 
as trainlm, trainbfg, trainrp, traingd, etc. 





Caution: trainlmis the default training function because it is very fast, but 过 
requires a lot of memory to run. If you get an "out-of-memory" error when 
training try doing one ofthese: 





1 Slow trainlm training, but reduce memory requirements by setting 
net.trainParam.mem_reduc to 2 or more. (See help trainlm.) 


2 Use trainbfg, which is slower but more memory-efEGcient than train1lm. 
3 Use trainrp, which is slower but more memory-efficient than trainbfg. 


The learning function BLF can be either ofthe backpropagation learning 
fanctions such as learngd or learngdm. 


The performance function can be any ofthe differentiable performance 
fanctions such as mse or msereg. 
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Algorithm 


See Also 
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Here is a problem consisting of inputs P and targets Tthat we would like to 
solve with a network. 


0] 


P= [012345678 1 
2 1 2 4]; 


9 
T= [012343 3 
Here a two-layer feed-forward network is created. The network's input ranges 
位 om [0 to 10]. The first layerhas five tansig neurons, the second layer has one 
purelLin neuron. The trainlmmnetwork training function is to be used. 


net = newff([0 10],[5 1],{f tansig purelin' }) 


Here the network is Simulated and its output plotted against the targets. 


Y=Sim(net,P); 
plot(P,T,P,Y，0 ') 


Here the network is trained for 50 epochs. Again the network's output is 
plotted. 


net.trainParam.epochs = 50 
net = train(net,P,T) 
Y=Sim(net,P); 
plot(P,T,P,Y，0') 


Feed-forward networks consist of N1 ljayers using the dotprod weight function， 
netsum net input function, and the specified transfer fanctions. 


The first layer has weights coming from the input. 也 ach subsequent layer has 
a Weight coming 位 om the previous layer. All layers have biases. The last layer 
is the network output. 


卫 ach layers weights and biases are initialized with initnw. 


Adaption is done with trains, which updates weights with the specified 
learning function. Training is done with the specified training function. 
Performance is measured according to the specified performance function. 


newcf ，newelm，SsSimn，init，adapt，train，trains 
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Purpose 


Synftax 


Descripfion 


Create a feed-forward input-delay backpropagation network 


net = newfftd 
net = newfftd(PR,ID,[S1 S2...SNL1],{TF1 TF2...TFN1L},BTF,BLF,PF) 


net = newfftd creates a new network with a dialog box. 
newfftd(PR,ID,[S1 S2...SN1],{TF1 TF2...TFNL},BTF,BLF,PF) takes， 


PR - RX2matrzxofmin andmaxvalues for Rinput elements. 
ID - Input delay vector. 

Si 
TFIi - Transfer function of ith layer, default = 'tansig'. 


Size of ith layer, for N1 layers. 


BTF - Backprop network training function, default = 'traingdx'. 
BLF - Backprop weight/bias learning function, default = ,Learngdm'. 
PF “ - _ Performance function, default = "mse'. 

and returns an N layer feed-forward backprop network. 


The transfer functions TFi can be any differentiable transfer function Such as 
tansig, 1ogsig, or purelin. 


The training function BTF can be any ofthe backprop training functions such 
as trainlm, trainbfg, trainrp, traingd, etc. 





Caution: trainlmis the default training function because it is very fast, but 过 
requires a lot of memory to run. If you get an "out-of-memory" error when 
training try doing one ofthese: 





1 Slow trainlnm training, but reduce memory requirements by setting 
net.trainParam.mem_reduc to 2 or more. (See help trainlm.) 


2 Use trainbfg, which is slower but more memory-efEGcient than train1lm. 
3 Use trainrp, which is slower but more memory-efficient than trainbfg. 


The learning function BLF can be either ofthe backpropagation learning 
fanctions such as learngd or learngdm. 
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The performance function can be any ofthe differentiable performance 
fanctions Such as mse or msereg. 


Here is a problem consisting of an input sequence P and target sequence Tthat 
can be solved by a network with one delay. 


P={ 人 10011 01 .000 
T={l1-1010-11-100 


011 00 1 

010-101); 

Here a two-layer feed-forward network is created with input delays of 0 and 1. 
The network's input ranges from [0 to H. The first layer has five tansig 
neurons, the second layer has one purelin neuron. The trainlm network 
training function is to be used. 


net = newfftd([0 1],[0 1],[5 1],{tansig” purelin' }); 


Here the network is Simulated. 

Y=Simn(net,P) 
Here the network is trained for 50 epochs. Again the network's output is 
calculated. 


net.trainParam.epochs = 50 
net = train(net,P,T) 
Y= Sim(net,P) 


Feed-forward networks consist of N1 ljayers using the dotprod weight function， 
netsum net input function, and the specified transfer functions. 


The first layer has weights coming fom the input with the specified input 
delays.Each subsequent layer has a weight coming 位 om the previous layer. All 
layers have biases. The last layer is the network output. 


卫 ach layers weights and biases are initialized with initnw. 


Adaption is done with trains, which updates weights with the specified 
learning function. Training is done with the specified training function. 
Performance is measured according to the specified performance function. 


newcf ，newelm，SsSimn，init，adapt，train，trains 
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Purpose 


Syntax 


Descripfion 


Properties 


Examples 


See Also 


Design a generalized regression neural network (grnD) 


net = newgrnn 


net = newgrnn(P,T,Sspread ) 


net = newgrnn creates a new network with a dialog box. 


Generalized regression neural networks areakind ofradial basis network that 
is often used for function approximation. grnn Ss can be designed very quickly. 


newgrnn(P,T,spread) takes three inputs， 
P - RXQmatrzxofaQinput vectors. 
T - SXQmatrxofaQtarget class vectors. 
spread - Spread ofradial basis functions, default = 1.0. 
and returns a new generalized regression neural networkK. 
The larger the spread, is the smoother the fanction approximation will be. To 


fit data very closely, use a spread smaller than the typical distance between 
input vectors. To fit the data more smoothly, use a larger spread. 


newgrnn creates a two-layer network. The first layer has radbas neurons， 
calculates weighted inputs with dist and net input with netprod. The second 
layer has purelin neurons, calculates weighted input with normprod and net 
inputs with netsum. Only the first layer has biases. 


newgrnn sets the first layer weights to P, and the first layer biases are all set 
to 0.8326/spread, resulting in radial basis fanctions that cross 0.5 at weighted 
inputs of +/- spread. The second layer weights W2 are Set to T. 


Here we design aradial basis network given inputs P and targets T. 


P= [123]; 
T= [2.04.15.9] 
net = newgrnn(P,T) ; 


Here the network is Simulated for a new input. 


P=1.5; 
Y=Sim(net,P) 


Simn，newrb，newrbe，newpnn 
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References Wasserman, P.D.,Aauvamcea Methoas 友 Nevral Computins, New York: Van 
Nostrand Reinhold, pp. 155-61, 1993. 
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Purpose 


Synftax 


Descripfion 


Properties 


Examples 


Create a Hopfield recurrent network 


net = newhop 


net = newhop(T) 


了 opfield networks are used for pattern recal]. 
net = newhop creates a new network with a dialog box. 
newhop(T) takes one input argument， 


T - RXQmatrzxofaQtarget vectors. (Values must be +1 or -1.) 


andreturns anew Hopfield recurrent neural network with stable points at the 
Vectors in T. 


了 opfield networks consist of a single layer with the dotprod weight function， 
netsum net input fanction, and the sat1lins transfer function. 


The layer has a recurrent weight from itself and a bias. 


Here we create a Hopfield network with two three-element stable points T. 


T= [-1-11; 1 -11] | 
net = newhop(T) ; 


Below we check that the network is stable at these points by using them as 
initial layer delay conditions. Ifthe network is stable we would expect that the 
outputs Y will bethe same. (Since Hopfield networks have no inputs, the second 
argument to Simis Q = 2 when using matrix notation). 

Al = T; 

[Y,Pf,Af] = Sim(net,2,[],Ai); 

Y 


To see 计 the network can correct a corrupted vector, run the following code， 
which simulates the Hopfield network for five time steps. (Since Hopfield 
networks have no inputs, the second argument to Simis {Q TS} = [1 5] when 
using cell array notation.) 


Al = {[-0.9; -0.8; 0.7]1 
[Y,Pf,Af] = Sim(net,{1 5},{})， AL) ; 
Y{11} 
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Ifyourun the above code, Y{1} will equalT(: ,1) ifthe network has managed 
to convert the corrupted vector Ai to the nearesgst target vector. 


Hopfield networks are designedto have stable layer outputs as defined by useT- 
Supplied targets. The algorithm minimizes the number of unwanted stable 
points. 


Slim，Sat1lins 
Li, J.,,A. N. Michel, and W. Porod “Analysis and synthesis of a class ofneural 


networks: linear Systems operating on aclosed hypercube,”7 五 及 玉 77a7sactzo72s 
0 Ci7rcuits aa Systems, vol. 36, no. 11, pp. 1405-1422, November 1989. 
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Purpose 


Synftax 


Descripfion 


Examples 


Create a linear layer 


net = newlin 


net = newlin(PR,S,ID,LR) 


Linear layers are often used as adaptive filters for signal processing and 
prediction. 


net = newlin creates a new network with a dialog box. 
newlin(PR,S,ID,LR) takes these arguments， 


PR - RX2matrizx ofmin and max values for Rinput elements. 
S - Number ofelements in the output vector， 

ID - Input delay vector, default = [0]. 

LR - Learning rate, default = 0.01. 


and returns a new linear layer. 
net = newlin(PR,S,0,P) takes an alternate argument， 


P - Matrix ofinput vectors, 


and returns a linear layer with the maximum stable learning rate for learning 
with inputs P. 


This code creates a single input (range of [-1 1] linear layer with one neuron， 
input delays of0 and 1, and alearning rate of0.01. Itis simulated for an input 
Sequence P1. 


net = newlin([-1 1],1,[0 1];0.01) ; 
P1={0O-1110-11001) 
Y=Sim(net,P1) 


Here targets T1 are defined and the layer adapts to them. (Since this is the first 
call to adapt, the default input delay conditions are used.) 
T1={0-1021-10101)| 
[net,Y,E,Pf]l] = adapt(net;,P1,T1); Y 


Here the linear layer continues to adapt for anew sequence using the previous 
final conditions PF as initial condqitions. 
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P2={10-1-11110 -1 
T2={21-1 -202210}; 
[net,Y,E,Pf]l] = adapt(net,P2,T2,Pf); Y 


Here we initialize the layers weights and biases to new values. 

net = init(net) ; 
Here wetrain thenewly initialized layer on the entire sequence for 200 epochs 
to an error goal of 0.1. 


P3 = [P1 P2]; 

T3 = [T1 T2]; 
net.trainParam.epochs = 200; 
net.trainParam.goal = 0.1; 
net = train(net,P3,T3) ; 
Y=Sim(net,[P1 P2]) 


Linear layers consist ofa single layer with the dotprod weight function, netsum 
net input function, and purelin transfer function. 

The layer has a weight from the input and a bias. 

Weights and biases are initialized with initzero.， 

Adaption and training are done with trains and trainb, which both update 


weight and bias values with learnwh. Performance is measured with mse. 


newlind，Ssim，init，adapt，train，trains，trainb 
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Purpose 


Synftax 


Descripfion 


Examples 


Design a linear layer 


net = newlind 


net = newlind(P,T,Pi) 


net = newlind creates a new network with a dialog box. 
newlind(P,T,Pi) takes these input arguments， 


P - RXQmatrzxofQinput vectors, 
T - SXQmatrxofaQtarget class vectors. 
Pi- 1 XIDcell array ofinitial input delay states， 


where each element Pifi,klyis an RixQ matrix, default = 
and returns a linear layer designed to output T (with minimum sum square 
erTor) given input P. 


newlind(P,T,Pi) can also solve for linear networks with input delays and 
multiple inputs and layers by supplying input and target data in cell array 
form: 

P - NixTS cell array, each element P{fi,ts}yis an Ri xQinput matrix， 

T - NtxTS cell array, each element P{i,ts} is an Vi XQmatrix， 

Pi - NixID cell array, each element Pifi,kyisan Ri xQmatrix, default = 


0. 


returns a linear network with ID input delays, Ni network inputs, N1L layers， 
and designed to output T (with minimum sum square error) given input P. 
Wewouldlike alinearlayerthatoutputs T given P for the following definitions， 


P= [123]; 
T= [2.04.15.9]; 


Here we use newlind to design such a network and check its response. 


net = newlind(P,T) ; 
Y=Sim(net,P) 


We would like another linear layer that outputs the sequence T given the 
sequence P and two initial input delay states Pi. 
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P={12133 2 

PI = {1 3}; 
T={5.06.14.06.06.9 8.0} 
net = newlind(P,T,PI) ; 

Y= Sim(net,P,Pi) 


We would like alinear network with two outputs Y1 and Y2 that generate 
Sequences T1 and T2, given the sequences P1 and P2 with three initial input 
delay states Pi1 for input 1, and three initial delays states Pi2 for input 2. 


P1={121332};， Pil={130}| 
P2={121121;， Pi2={212}; 

T1 = {5.0 6.14.0 6.0 6.9 8.0}; 

T2 = {11.0 12.1 10.1 10.9 13.0 13.01; 

net = newlind([P1; P2],[T1; T2],[Pi1; Pi2]); 
Y = sim(net,[P1; P2],[Pi1; Pi2]); 


Y1 = Y(1，:) 
Y2 =Y(2,:) 
Algorithm newlind calculates weight W and bias B values for a linear layer 人 fom inputs P 


and targets T by solving this linear equation in the least Squares Sengse: 


[Wb]l * [P; ones] = 工 


See Also sim，newlin 
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Purpose 


Synftax 


Descripfion 


Properties 


Examples 


Create a learning vector quantization network 


net = newlvdq 


net = newlvdq(PR,S1,PC,LR,LF) 


Learning vector quantization (LVQ) networks are Used to Solve clasgsification 
problems. 


net = newlvq creates a new network with a dialog box. 
net = newlvq(PR,S1,PC,LR,LF) takes these inputs， 


PR - RX2matrzxofmin andmaxvalues for Rinput elements. 
S1 - Number ofhidden neurons. 
PC - S2 element vector of typical class percentages. 
LR - Learning rate, default = 0.01. 
LF -_ Learning fanction, default = 'Learnlv2'. 
returns a new LVQ network. 


The learning function LE can be learnlv1 or learn1lv2. 


newlvq creates a two-layer network. The first layer uses the compet transfer 
fanction, calculates weighted inputs with negdist, and net input with netsum. 
The second layerhas purelinneurons, calculates weighted input with dotprod 
and net inputs with netsum. Neither layer has biases. 


First layer weights are initialized with midpoint.The second layer weights are 
set so that each output neuron i has unit weights coming to it from PC(i) 
percent of the hidden neurons. 


Adaption and training are done with trains and trainr, which both update 
the first layer weights with the specified learning functions， 

The input vectors P and target classes Tc belovw define a classification problem 
to be solved by an LVQ network. 


P=[-3-2-20.000+2+2+3; 
0+rl-1+2+l-1-2+l -1 0]; 
Tc=[I1112222111]; 
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The target classes Tc are converted to target vectors T. Then, an LVQ network 
is created (with inputs ranges obtained from P, four hidden neurons, and class 
percentages of 0.6 and 0.4) and is trained. 


T = ind2vec(Tc) 
net = newlvdqd(minmax(P) ,4,[.6 .4]); 
net = train(net,P,T) 
The resulting network can be tested. 
Y= Sim(net,P) 
YC = vec2ind(Y) 


Slimn，init，adapt，train，trains，trainr，1learnlv1，1earnl1v2 
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Purpose 


Synftax 


Descripfion 


Properties 


Examples 


Create a perceptron 


net = newp 
net = newp(PR,S,TF,LF) 
Perceptrons are used to solve Simple (i.e. linearly separable) classification 
problems. 
net = newp creates a new network with a dialog box. 
net = newp(PR,S,TF,LF) takes these inputs， 
PR - RX2matrzxofmin andmaxvalues for Rinput elements. 
S - Number ofneurons. 
TF - _ Transfer function, default = 'hardlim'. 


LF -_ Learning function, default = ' learnp'. 


and returns a new perceptron . 

The trangsfer fanction TF can behardlimor hardlims. The learning function LF 
can be learnp or learnpn. 

Perceptrons consist of a single layer with the dotprod weight function, the 
netsum net input function, and the specified transfer function. 

The layer has a weight 人 fom the input and a bias. 

Weights and biases are initialized with initzero. 


Adaption and training are done with trains and trainc, which both update 
weight and bias values with the specified learning fanction. Performance is 
measured with mae. 


This code creates a perceptron layer with one two-element input (ranges [0 1 
and [-2 2]) and one neuron. (Supplying only two arguments to newp results in 
the default perceptron learning function learnp being used.) 


net = newp([0 1; -2 2]，,1); 
Here we simulate the network to a sequence of inputs P. 


P1={[0i0]l] [0;1] [1 0] [1 1]) 
Y= Sim(net,P1) 
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Here we define a sequence oftargets T (together PandT define the operation of 
an AND gate), and then let the network adapt for 10 passes through the 
Sequence. We then simulate the updated network. 


T1={0001}; 
net.adaptParam.passes = 10; 
net = adapt(net,P1,T1) ; 
Y=Sim(net,P1) 


Now we define a new problem, an 0R gate, with batch inputs P and targets T. 


P2= [0011; 0101]; 
T2 [0 111]; 


Here we initialize the perceptron (resulting in new random weight and biasgs 
values), simulate its output, train for a maximum of 20 epochs, and then 
simulate it again. 


net = init(net) ; 

Y = Sim(net,P2) 
net.trainParam.epochs = 20 
net = train(net,P2,T2) ; 

Y = Sim(net,P2) 


Perceptrons can classify linearly separable classes in a finite amount of time， 
Hinput vectors have a large variance in their lengths, the learnpn can be 
faster than learnp. 


Slimn，init，adapt，train，hardlim，hardlims，1lLearnp，1lLearnpn，trains， 
trainc 
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Purpose 


Synftax 


Descripfion 


Examples 


Algorithm 


Design a probabilistic neural network 


net = newpnn 


net = newpnn(P,T,spread ) 


Probabilistic neural networks (PNN) are akind ofradqdial basis network 
Suitable for classification problems， 


net = newpnn creates a new network with a dialog box. 
net = newpnn(P,T,spread)takes two or three arguments， 


P - RXQmatrzxofaQinput vectors, 
T - SXQmatrixofaQtarget class vectors. 
spread - Spread ofradial basis functions, default = 0.1. 


and returns a new probabilistic neural network. 


Ifspread is near zero, the network will act as anearest neighbor classifier. As 
spread becomes larger, the designed network will take into account several 
nearby design Vectors. 


Here a classification problem is defined with a set ofinputs P and class indices 
TC， 


P= [1234567]; 
Tc=[1232231]; 


Here the class indices are converted to target vectors, and a PNN is designed 
and tested. 


T = ind2vec(Tc) 
net = newpnn(P,T) ; 
Y=Sim(net,P) 
YCc = vec2ind(Y) 


newpnn creates a two-layer network. The first layer has radbas neurons, and 
calculates its weighted inputs with dist, and its net input with netprod. The 
second layer has compet neurons, and calculates its weighted input with 
dotprod and its net inputs with netsum. Only the first layer has biases. 
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newpnn sets the first layer weights to P, and the first layer biases are all set to 
0.8326/spread, resulting in radial basis functions that cross 0.5 at weighted 
inputs of +/- spread. The second layer weights W2 are set to T. 


Simn，ind2vec，vec2ind，newrb，newrbe，newgrnn 


Wasserman, P.D.,Aauvamcea Methoas 友 Nevral Comzputins, New York: Van 
Nostrand Reinhold, pp. 35-55, 1993. 
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Synftax 


Descripfion 
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Algorithm 


Design a radial basis network 


net = newrb 


[net,tr] = newrb(P,T,goal,spread,MN,DF) 


Radial basis networks can be used to approximate functions. newrb adds 
neurons to the hidden layer of a radial basis network until it meets the 
specified mean squared erTror goal. 


net = newrb creates a new network with a dialog box. 
newrb(P,T,goal,spread,MN，DF) takes two to these arguments， 


P - RXQmatrxofQinput vectors. 

T - SXQmatrix ofaQtarget class vectors. 

goal - Mean squared error goal, default = 0.0. 

spread - Spread ofradial basis functions, default = 1.0. 

MN - Maximum number ofneurons, default is Q. 

DF - Number ofneurons to add between displays, default = 25. 
and returns a new radial basis network. 
Thelargerthat spread is, the smoother the function approximation will be. Too 
large a Spread means a lot of neurons will be required to fit a fast changing 
fanction. Too small a spread means many neurons will be required to fit a 


smooth function, and the network may not generalize well. Call newrb with 
different spreads to find the best value for a given problem. 


Here we design aradial basis network given inputs P and targets T. 


P= [123]; 
T= [2.04.15.9] 
net = newrb(P,T) ; 


Here the network is Simulated for anew input. 


P=1.5; 
Y=Sim(net,P) 


newrb creates a two-layer network. The first layer has radbas neurons, and 
calculates its weighted inputs with dist, and its net input with netprod. The 
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second layer has purelin neurons, and calculates its weighted input with 
dotprod and its net inputs with netsum. Both layers have biases, 


Initially the radbas layer has no neurons. The following steps are repeated 
until the network's mean squared erTror falls below goal. 

1 The network is simulated. 

2 The input vector with the greatest error is found. 

3 Aradbas neuron is added with weights equal to that vector. 

4 The purelin layer weights are redesigned to minimize error. 


Sim，newrbe，newgrnn，newpnn 
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Purpose 


Syntax 


Descripfion 


Examples 


Algorithm 


Design an exact radial basis network 


net = newrbe 
net = newrbe(P,T,spread ) 
Radial basis networks can be used to approximate functions. newrbe veTry 
quickly designgs a radial basis network with zero error on the design vectors， 
net = newrbe creates a new network with a dialog box. 
newrbe(P,T,spread) takes two or three arguments， 

P - RXQmatrxofQinput vectors, 

T - SXQmatrxofaQtarget class vectors. 

spread - Spread ofradial basis functions, default = 1.0. 
and returns a new exact radial basis network. 


The larger the spread is, the smoother the function approximation will be. Too 
large a Spread can cause numerical problems. 


Here we design aradial basis network given inputs P and targets T. 


P= [123]; 
T= [2.04.15.9] 
net = newrbe(P,T) ; 


Here the network is simulated for a new input. 


P=1.5; 
Y= Sim(net,P) 


newrbe creates a two-layer network. The first layer has radbas neurons, and 
calculates its weighted inputs with dist, and its net input with netprod. The 
second layer has purelin neurons, and calculates its weighted input with 
dotprod and its net inputs with netsum. Both layers have biases. 


newrbe sets the first layer weights to P, and the first layer biases are all set to 
0.8326/spread, resulting in radqial basis functions that cross 0.5 at weighted 
inputs of +/- spread. 


The second layer weights IW{2,1} and biases b{2} are found by simulating the 
first layer outputs A{1}, and then solving the following linear expression: 
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[W{2,1} bf2}+] * [At1}; ones] = 工 


See Also Simn，newrb，newgrnn，newpnn 
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Properties 
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Create a self-organizing map 


net = newSom 


net = newsom(PR,[D1,D2,...],TFCN,DFCN,OLR,0OSTEPS,TLR,TND) 


Competitive layers are used to solve classification problems. 
net = newsom creates a new network with a dialog box. 
net = newsom (PR,[D1,D2,...],TFCN,DFCN,OLR,OSTEPS,TLR,TND) takes， 


PR - RX2matrzxofmin andmaxvalues for Rinput elements. 
Di - Size of ith layer qimension, defaults = [5 8]. 
TFCN -_ Topology function, default ='hextop'. 
DFCN -_ Distance function, default ='Linkdist'. 
0OLR - Ordering phase learning rate, default = 0.9. 
0STEPS - _ Ordering phase steps, default = 1000. 
TLR - Tuning phase learning rate, default = 0.02; 
TND - Tuning phase neighborhood distance, default = 1. 
and returns a new self-organizing map. 


The topology function TFCN can be hextop, gridtop, or randtop. The distance 
function can be Linkdist, dist, or mandist. 


Self-organizing maps (SOM) consist of a single layer with the negdist weight 
function, netsum net input function, and the compet transfer function. 


The layer has a weight 们 om the input, but no bias. The weight is initialized 
with midpoint. 


Adaption and training are done with trains and trainr, which both update 


the weight with learnsom. 


The input vectors defined below are distributed over an two-dimension input 
space varying over [0 2] and [0 1. This data will be used to train a SOM with 
dimensions [3 5]. 

P= [rand(1,400)*2; rand(1,400) ] 

net = newsom([0 2; 0 1],[3 5]); 
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plLlotsom(net.1ayers{1}.positions ) 


Here the SOM is trained and the input vectors are plotted with the map that 
the SOMs weights have formed. 


net = train(net,P) 
plot(P(1,:),P(2,:)，.g' ，markersize' ,20) 


hold on 
plLlotsom(net.iw{1,1}+,net.lLlayers{1}.distances) 
hold off 

See Also sim，init，adapt，train 
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Copy matrix or cell array 
nncopy(X,M,N) 


nncopy(X,M;,N) takes two arguments， 


X - RXCmatrzx (or cell array). 
M - Number of vertical copies. 
N - Number ofhorizontal copies. 
and returns a new (R*M) X(C*N) matrix (or cell array). 


x1= [123j456]; 
y1 = nncopy(x1,3,2) 
x2 = {[12]; [3; 4; 5]} 


y2 = nncopy(x2,2),3) 
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PurPpose Update NNT 2.0 competitive layer 

Syntax net = nnt2c(PR,W,KLR,CLR) 

Description nnt2c(PR,W,KLR,CLR) takes these arguments， 
PR - RX2matrzxofmin andmaxvalues for R input elements. 
W  - SxRweight matrix. 


KLR - Kohonen learning rate, default = 0.01. 
CLR -_ Congscience learning rate, default = 0.001. 


and returns a competitive layer. 


Once anetwork has been updated, it can be simulated, initialized, or trained 
with sim, init, adapt, and train. 


See Also newc 
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Purpose 


Synftax 


Descripfion 
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Update NNT 2.0 Elman backpropagation network 
net = nnt2elm(PR,W1,B1,W2,B2,BTF,BLF,PF) 


nnt2elm(PR,W1,B1,W2,B2,BTF,BLF,PF) takes these arguments， 


PR - RX2matrzxofmin andmaxvalues for Rinput elements. 
W1 - S1 X(R+S1) weight matrix. 
B1 - S1 X 1bias vector. 
W2 - S2XS1 weight matTix. 
B2 - S2 X 1 bias vector. 
BTF - Backpropagation network training function, default = 'traingdx'. 
BLF - Backpropagation weight/bias learning function, default = ,learngdm'. 
PF “ - _ Performance function, default = "mse'. 
and returns a feed-forward network. 
The training function BTF can be any ofthe backpropagation training functionsg 


Such as traingd, traingdm, traingda, and traingdx. Large step-size 
algorithms, such as trainlm, are not recommended for Elman networks, 


The learning function BLF can be either ofthe backpropagation learning 
fanctions such as learngd or learngdm. 


The performance function can be any ofthe differentiable performance 
functions such as mse or msereg. 


Once a network has been updated, it can be simulated, initialized, adapted, or 
trained with sim, init, adapt, and train. 


newelm 
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nnf2 午 





PurPpose 
Syntax 


Descripfion 


See Also 
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Update NNT 2.0 feed-forward network 
net = nnt2ff(PR,{W1 W2 ...},{B1 B2 ...},{TF1 TF2 ...},BTF,BLR,PF) 


nnt2ff(PR,{W1 W2 ...},{B1 B2 ...},{TF1 TF2 ...},BTF,BLR,PF) takes 
these arguments， 
PR - RXx2matrzxzofmin andmaxvalues for Rinputelements. 
Wi - Weight matrix for the ith layer. 
Bi  - Bias vector for the ith layer. 
TFIi -_ Transfer function of ith layer, default = 'tansig'. 
BTF - Backpropagation network training function, default = "traingdx'. 
BLF - Backpropagation weight/bias learning function, default = ,learngdm'. 
PF “ - _ Performance function, default = "mse '. 
and returns a feed-forward network. 


The training function BTF can be any ofthe backpropagation training functions 
Such as traingd, traingdm, traingda, traingdx or train1lm. 


The learning function BLF can be either ofthe backpropagation learning 
fanctions such as learngd or learngdm. 


The performance function can be any ofthe differentiable performance 
fanctions Such as mse or msereg. 


Once anetwork has been updated, it can be simulated, initialized, adapted, or 
trained with sim, init, adapt, and train. 


newff ，newcf ，newfftd，newelm 


nnt2hop 





Purpose 
Synftax 


Descripfion 


See Also 


Update NNT 2.0 Hopfield recurrent network 
net = nnt2p(W,B) 


nnt2hop (W,B) takes these arguments， 


W - SXS weight matrix， 
B - SX1bias vector 


and returns a perceptron. 


Once a network has been updated, it can be simulated, initialized, adapted, or 
trained with sim, init, adapt, and train. 


newhop 
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nnt2lin 





PurPpose Update NNT 2.0 linear layer 
Syntax net = nnt21in(PR,W,B,LR) 
Description nnt21in(PR,W,B) takes these arguments， 


PR - RX2matrizx ofmin and max values for Rinput elements. 
W - SxXRweight matrix. 
B - SxX1bias vector 
LR -_ Learning rate, default = 0.01; 
and returns a linear layer. 


Once anetwork has been updated, it can be simulated, initialized, adapted, or 
trained with sim, init, adapt, and train. 


See Also newlin 
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nnt2lvq 





Purpose 
Synftax 


Descripfion 


See Also 


Update NNT 2.0 learning vector quantization networK 
net = nnt21vq(PR,W1,W2,LR,LF) 


nnt21vq(PR,W1,W2 ,LR,LF) takes these arguments， 


PR - RX2matrx ofmin and maxvalues for Rinput elements. 
W1 - S1 XR weight matTix. 

W2 - S2 XS1 weight matrix. 

LR - Learning rate, default = 0.01. 

LF - Learning function, default = 'Learnlv2'. 


and returns aradial basis network. 
The learning function LF can be learnlv1 or learnlv2. 


Once a network has been updated, it can be simulated, initialized, adapted, or 
trained with sim, init, adapt, and train. 


newJlVvdq 
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nnt2p 





PurPpose 
Syntax 


Descripfion 


See Also 


14-184 


Update NNT 2.0 perceptron 
net = nnt2p(PR,W,B,TF,LF) 


nnt2p(PR,W,B,TF,LF) takes these arguments， 


PR - RX2matrizx ofmin and max values for Rinput elements. 
W - SxXRweight matrix. 
B - SX1bias vector. 
TF - _ Transfer function, default = 'hard1lLim'. 
LF - Learning fanction, default = 'Learnp'. 
and returns a perceptron. 


The trangsfer fanction TF can be hardlimor hardlims. The learning function LF 
can be learnp or learnpn. 


Once anetwork has been updated, it can be simulated, initialized, adapted, or 
trained with sim, init, adapt, and train. 


newp 


nnf2rb 





Purpose 
Synftax 


Descripfion 


See Also 


Update NNT 2.0 radial basis network 
net = nnt2rb(PR,W1,B1,W2,B2) 


nnt2rb(PR,W1,B1,W2,B2) takes these arguments， 


PR - RX2matrzxofmin andmaxvalues for Rinput elements. 
W1 - S1 XRweight matTrix. 
B1 - S1 X1bias vector. 
W2 - S2 XS1 weight matrix. 
B2 - S2 X1bias vector. 
and returns aradial basis network. 


Once a network has been updated, it can be simulated, initialized, adapted, or 
trained with sim, init, adapt, and train. 


newrb，newrbe，newgrnn，newpnn 
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nnt2som 





PurPpose Update NNT 2.0 self-organizing map 

Syntax net = nnt2som(PR,[D1,D2,...],W,OLR,OSTEPS,TLR,TND) 

Description nnt2som(PR, [D1,D2,...],W,0OLR,OSTEPS,TLR,TND) takes these arguments， 
PR - RXx2matrzxofmin andmaxvalues for Rinputelements. 
Di - Size ofith layer dimension. 
W  - SxRweight matrix. 


0OLR - Ordering phase learning rate, default = 0.9. 
0STEPS - _ Ordering phase steps, default = 1000. 
TLR - Tuning phase learning rate, default = 0.02; 
TND - Tuning phase neighborhood distance, default = 1. 
and returns a self-organizing map. 
nnt2som assumes that the self-organizing map has a grid topology (gridtop) 


using link distances (Linkdist). This corresponds with the neighborhood 
fanction in NNT 2.0. 


The new network will only output 1 for the neuron with the greatest net input， 
In NNT 2.0 the network would also output 0.5 for that neuron's neighbors. 


Once anetwork has been updated, it can be simulated, initialized, adapted, or 
trained with sim, init, adapt, and train. 


See Also newsom 
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nntool 





Purpose Neural Network Tool - Graphical User Interface 
Syntax nntoo1l 
Descripfion nntool opens the Network/Data Manager window, which allows you to import， 


create, use, and export neural networks and data. 
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moOrnmncC 





PurPpose 
Syntax 
Description 


Examples 


See Also 


14-188 


Normalize the columns of a matrix 
normc(M) 
normc(M) normalizes the columns of M to a length of 1. 


m= [12;34]); 


normc (m) 
ans = 
0.3162 0.4472 
0.9487 0.8944 
normr 


normprod 





Purpose 


Synftax 


Descripfion 


Examples 


Network Use 


Algorithm 


See Also 


Normalized dot product weight function 


Z= normprod(W,P) 

df = normprod(' deriv ') 

normprod is a weight function. Weight functions apply weights to an input to 
get weighted inputs. 

normprod(W,P) takes these inputs， 


W - S XRweight matrix. 

P - RXQmatrzxofaQinput (column) vectors. 
and returngs theSxQmatrix ofnormalized dot products. 
normprod( 'deriv') returns '' because normprod does not have a derivative 
fanction. 
Here we define arandom weight matrix Wand input vector P and calculate the 
corresponding weighted input Z. 

W= rand(4,3) 

P = rand(3,1); 

Z= normprod(W,P) 
You can create a standard network that uses normprod by calling newgrnn. 


To change a network so an input weight uses normprod, set 
net.inputWeight{fi,j}y.weightFcnto 'normprod . For a layer weight, set 
net.inputWeight{I,j}.weightFcnto normprod . 


In either case call simto simulate the network with normprod.See newgrnn for 
Simulation examples. 


normprod returns the dot product normalized by the sum of the input vector 
elements, 


Z = Wx*p/sum(p) 


Slimn，dotprod，negdist，dist 
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mmOMnT TY 





PurPpose 
Syntax 
Descripfion 


Examples 


See Also 
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Normalize the rowsgs of a matrixX 
normr(M) 
normr(M) normalizes the columns of M to a length of 1. 


m= [12;34]; 


normr(m) 
ans = 
0.4472 0.8944 
0.6000 0.8000 
normc 


Plotbr 





Purpose 
Syntax 


Descripfion 


Examples 


Plot network performance for Bayesian regularization training. 
plotbr(TR,name,epoch) 


plotbr(tr,name,epoch) takes these inputs， 
TR -_ Training record returned by train. 


name -_ Training function name, default = ". 
epoch - Number of epochs, default = length of training record. 
and plots the training sum squared error, the sum squared weight, and the 


effective number of parameters. 
Here are input values P and associated targets T. 


p= [-1:.05:1]; 
t = Sin(2*pix*p)+0.1*randn(Size(p)); 


The code below creates a network and traings it on this problem. 


net=newff([-1 1],[20,1],{ tansig ，purelin' }，trainbr ) ; 
[net,tr]l = train(net,p,t) ; 


During training plotbr was called to display the training record. You can also 
call plotbr directly with the final training record TR, as shown below. 


plotbr(tr) 
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plotep 





PurPpose 


Syntax 


Descripfion 


See Also 
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Plot a weight-bias position on an erTror Surface 


h = plotep(W,B,E) 
h plLlotep(W,B,E,H) 


plotep is used to show network learning on a plot already created by plLotes. 
plotep(W,B,E) takes these arguments， 


W -_ Current weight value. 
B -_ Current bias value. 
E - _ Current erTor. 
and returns a vector H, containing information for continuing the plot. 


plotep(W,B,E,H) continues plotting usingthe vector Hreturned bythe last call 
to plLotep. 


H contains handles to dots plotted on the error Surface, so they can be deleted 
Dext time, as well as points on the erTor contour, so they can be connected. 


errsurf，plotes 


plotes 





Purpose 
Syntax 


Descripfion 


Examples 


See Also 


Plot the error surface of a single input neuron 
plotes(WV,BV,ES,V) 


plotes(WV,BV,ES,V) takes these arguments， 


WV - 1 XNrow vector ofvalues of W. 
BV - 1 XMrow vector ofvalues of B. 
ES - MXNmatrix of error vectors. 
vV - View, default = [-37.5, 30]. 
and plots the error Surface with a contour underneath. 
Calculate the error surface ES with errsurf. 
p= [32]; 
t= [0.40.8]; 
WwWVv = -4:0.4:4j bv = WV; 


ES = errsurf(p,t,wvbv，1Logsig ' ) ; 
plotes(wv,bv,ES,[60 30]) 


errSurf 
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Plotpc 





PurPpose 


Syntax 


Descripfion 


Examples 


See Also 


14-194 


Plot a classification line on a perceptron vector plot 


plotpc(W,B) 
plotpc(W,B,H) 


plotpc(W,B) takes these inputs， 


W - SxXRweight matrix (Rmust be 3 or less). 
B - S X1bias vector. 
and returns a handle to a plotted classification line. 


plotpc(W,B,H) takes anadditional input， 
H - Handle to last plotted line. 
and deletes the last line before plotting the new one. 


This fanction does not change the current axis and is intended to be called after 
plLotpv. 


The code below defines and plots the inputs and targets for a perceptron: 


p=[0011; 0101]; 
t=[0001]; 
plotpv(p,t) 


The following code creates a perceptron with inputs ranging over the values in 
P, assigngs values to its weights and biases,and plots theresulting classification 
line. 


net = newp(minmax(p) ,1); 
net.iw{1,1} = [-1.2 -0.5]; 
net.b{1} = 1; 
plotpc(net.iw{1,1},net.b{1l}) 


plLotpv 


plotperf 





Purpose 
Synftax 


Descripfion 


Examples 


Plot network performance 
plotperf(TR,goal,name,epoch) 


plotperf(TR,goal,name,epoch) takes these inputs， 
TR - Training record returned by train， 
goal - Performance goal, default = NaN. 
name - Training function name, default = ' '. 
epoch - Number ofepochs, default = length of training record. 


and plots the training performance, and if available, the performance goal， 
validation performance, and test performance. 


Here are eight input values P and associated targets T, plus a like number of 
validation inputs VV,.P and targets VV .,T. 


P=1:8;jT=Sin(P); 
VV.P= P;VV.T=T+rand(1,8)*0.1; 


The code below creates a network and traings it on this problem. 


net = newff(minmax(P) ,[4 1],{f tansig ，tansig }) ; 
[net,tr] = train(net,P,T,[],[],VV); 


During training plotperf was called to display the training record. You can 
also call plotperf directly with the final training record TR，as shown below. 


plLotperf (tr) 
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plotpv 





PurPpose 


Syntax 


Descripfion 


Examples 


See Also 


14-196 


Plot perceptron input/target vectors 


plotpv(P,T) 
plotpv(P,T,V) 


plotpv(P,T) take these inputs， 


P - RxQmatrxofinput vectors (Rmust be 3 or less). 
T - SXQmatrix ofbinary target vectors (S must be 3 or less). 
and plots column vectors in P with markers based on T. 


plotpv(P,T,V) takes an additional input， 
v - Graph limits= [x min x max y_min y_max] 
and plots the column vectors with limits set by V. 


The code below defines and plots the inputs and targets for a perceptron: 


p=[0011;0101]; 
t=[0001]; 
plotpv(p,t) 


The following code creates a perceptron with inputs ranging over the values in 
P, assigngs values to its weights and biases, and plots the resulting classification 
line. 


net = newp(minmax(p) ,1); 
net.iw{1,1} = [-1.2 -0.5]; 
net.b{1} = 1; 
plotpc(net.iw{1;,1},net.b{1l}) 


plLotpc 


plofsom 





Purpose 


Syntax 


Descripfion 


Examples 


See Also 


Plot self-organizing map 


plLotsom(pos ) 
plotsom(W,D,ND) 


plotsom(pos) takes one argument， 
POS - NxS matrix ofS N-dimension neural positions and plots the neuron 
positions with red dots, lnking the neurons within aEuclidean distance of 1 
plotsom(W,d,nd) takes three arguments， 
W - SxR weight matrix. 
D - SxS distance matrix， 
ND - Neighborhood dqistance, default = 1. 


and plots the neuron's weight vectors with connections between weight vectors 
whose neurons are within a distance of |. 


Here are some neat plots of various layer topologies: 


pos = hextop(5,6); plotsom(pos) 
pos = gridtop(4,5); plotsom(pos ) 
pos = _ randtop(18,12); plotsom(pos ) 
pos = gridtop(4;,5,2); plotsom(pos ) 
pos = hextop(4,4;,3); plotsom(pos ) 


See newsom for an example of plotting a layers weight vectors with the input 
Vectors they map， 


newSsom，1learnsom，initsom， 
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plotv 





PurPpose 
Syntax 


Descripfion 


Examples 
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Plot vectors as lines 位 om the origin 
plotv(M,T) 


plotv(M,T) takes two inputs， 


M - RXQmatrix ofQcolumn vectors with R elements. 
T - (Coptional) the line plotting type, default = “- '. 
and plots the column vectors of M. 


Rmustbe 2 or greater. IfRis greater than two, only the first two rows of M are 
used for the plot. 


plotv([-.4 0.7 .2; -0.5 .1 0.5]，'-') 


plotvec 





Purpose Plot vectors with different colors 
Syntax plotvec(X,C,M) 
Descripfion plotvec(X,C,M) takes these inputs， 


X - Matrix of (column) vectors. 

C - Row vector of color coordinate. 

M - Marker, qefault = '+ '. 
and plots each ith vector in X with a marker M and using the ith value in C as 
the color coordinate. 


plotvec(X) only takes a matrix X and plots each ith vector in X with marker 
+ using the index i as the color coordinate. 


Examples x= [010.50.7; -120.50.1]; 
C= [1234]; 
plLotvec(Xx)C) 
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Pnormc 





PurPpose 
Syntax 


Descripfion 


Examples 


See Also 
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Pseudo-normalize columns of a matrix 
pnormc(X,R) 


pnormc(X,R) takes these arguments， 


X - MXNmatrix. 
R - (optional) radius to normalize columns to, default = 工 . 


andreturns X with an additional row ofelements, which results in new colummn 
Vector lengths of R. 





Caution: For this fonction to work properly the columns of X must originally 
have vector lengths less than R. 





Xx= [0.10.6j0.30.1]; 
y = pnormc(Xx) 


normc，normrr 广 


Poslin 





Purpose 


Graph and 
Symbol 


Syntax 


Descripfion 


Examples 


Positive linear trangsfer function 





QG = Poy17(1) 


Positive Linear Transfer Funct. 


A = poslin(N) 
info = poslin(code) 
poslinis atransferfunction. Transfer functiongs calculate alayers output 位 om 
its net input. 
poslin(N) takes one input， 
N - SXxQmatrizx ofnet input (column) vectors. 
and returns the maximum of 0 and each element of N. 
poslin(code) returns useful information for each code string: 
'deriv' - Name of derivative function. 
name'” - Full name， 
'output' - Output range. 
'active' - Active input range. 


Here is the code to create a plot ofthe poslin transfer function. 


n = -5:0.1:5; 
a = poslin(n) ; 
plot(ny,al) 
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Poslin 





Neftwork Use 


Algorithm 


See Also 


14-202 


To change a network so that a layer uses poslLin, set 
net.1ayers{i}.transferFcn to 'poslLin'. 


Call sim to simulate the network with poslin. 


The transfer function poslinreturns the output nifnis greater than or equal 
to zero and 0 放 n is less than or equal to zero. 


poslin(n) =n，ifn>= 0;j=0， ifn<= 0. 


Slim，purelin，Ssatlin，Ssat1lins 


PostmnmX 





Purpose 


Synftax 


Descripfion 


Examples 


Algorithm 


See Also 


Postprocess data that has been preprocessed by premnmx 


[P,T] = postmnmx(PN,minp,maxp;,TN,mint,maxt) 


[p] = postmnmx(PN,minp,maxp) 


postmnmx postprocesses the network training set that was preprocessed by 
premnmx. It converts the data back into unnormalized units, 


postmnmx takes these inputs， 


PN - RXQmatrxofnormalized input vectors. 
minp- R X 1 vector containing minimums for each P， 
maxp- R X 1 vector containing maximums for each P， 
TN - SxXxQmatrix ofnormalized target vectors. 
mint- S X 1 vector containing minimums for each T， 
maxt- S X 1 vector containing maximums for each T， 


and returns， 


P - RXQmatrix ofinput (colummn) Vectors. 


T - RXQmatrix oftarget vectors. 


In this example we normalize a set of training data with premnmx, create and 
train anetwork using the normalized data, simulate thenetwork, unnormalize 
the output ofthe network using postmnmx, and perform a linear regression 
between the network outputs (unnormalized) and the targets to check the 
quality ofthe network training. 


p= [-0.92 0.73 -0.47 0.74 0.29j -0.08 0.86 -0.67 -0.52 0.931] 
t= [-0.08 3.4 -0.82 0.69 3.1]; 
[pn,minp,maxp,tnmint,maxt] = premnmx(p ,七 ) ; 


net = newff(minmax(pn),[5 1],{f tansig purelin' }，trainlm' ) ; 
net = train(net,pntn) ; 

an = Sim(net,pn); 

[al = postmnmx(an,mint,maxt) ; 

[m,b,r] = postreg(a't); 


p=0.5(pn+1)*(maxp-minp) + minp; 


premnmx ，prepca，poststd 
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Postreg 





PurPpose 
Syntax 


Descripfion 


Examples 


Algorithm 


See Also 
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Postprocess the trained network response with a linear regression 
[M,B,R] = postreg(A,T) 


postreg postprocesses the network training set by performing a linear 
regression between each element of the network response and the 
corresponding target. 


postreg(A,T) takes these inputs， 


A - 1XQarray ofnetwork outputs. One element of the network output， 
T -1XQarray oftargets. One element of the target vector. 


and returns， 


M - Slope ofthe linear regression. 
B - Yintercept ofthe linear regression. 
R - _ Regression R-value. R=1 meangs perfect correlation. 


In this exzample we normalize a set of training data with prestd, perform a 
principal component transformation on the normalized data, create and train 
anetwork using the pca data, simulate the network, unnormalize the output 
of the network using poststd, and perform a linear regression between the 
network outputs (unnormalized) and the targets to check the quality of the 
network training. 


p= [-0.92 0.73 -0.47 0.74 0.29;j -0.08 0.86 -0.67 -0.52 0.93] 
t= [-0.08 3.4 -0.82 0.69 3.1] | 

[pn,meanp,stdp,tn,meant ,stdt] = prestd(p, 七 ) ; 

[ptrans,transMat] = prepca(pn,0.02) ; 

net = newff(minmax(ptrans),[5 1],{ tansig purelin'}+，trainlm' ) ; 
net = train(net,ptranstn) ; 

an = Sim(net,ptrans) ; 

a = poststd(an,meant stdt) ; 

[m,br] = postreg(at) 


Performs a linear regression between the network response andthe target, and 
then computes the correlation coefcient (R-value) between the network 


response and the target. 


premnmx，prepca 


Posfstd 





Purpose 


Syntax 


Descripfion 


Examples 


Algorithm 


See Also 


Postprocess data which has been preprocessed by prestd 


[P,T] = poststd(PN,meanp,stdp,TN,meant ,stdt) 
[p] = poststd(PN,meanp,stdp) 


poststd postprocesses the network training set that was preprocessed by 
prestd. It converts the data back into unnormalized units. 


poststd takes these inputs， 


PN - RXQmatrix ofnormalized input vectors. 
meanp - R X1vector containing standard deviationgs for each P. 
stdp - R X1vector containing standard deviations for each P. 
TN - S XQmatrix ofnormalized target vectors， 
meant - S x 1vector containing standard deviations for each T. 
stdt - S x 1vector containing standard deviationgs for each T. 


and returns， 


P - RXQmatrix ofinput (column) Vectors. 


T - SXQmatrix oftarget vectors. 


In this example we normalize a set of training data with prestd, create and 
train anetwork using the normalized data, simulate thenetwork, unnormalize 
the output ofthe network using poststd, and perform a linear regression 
between the network outputs (unnormalized) and the targets to check the 
quality ofthe network training. 


p = [-0.92 0.73 -0.47 0.74 0.29;j -0.08 0.86 -0.67 -0.52 0.93] 
t= [-0.08 3.4 -0.82 0.69 3.1]; 

[pn,meanp,stdp,tn,meant,stdt] = prestd(p,t) ; 

net = newff(minmax(pn),[5 1],{ftansig purelin' }，trainlm' ) ; 
net = train(net,pntn) ; 

an = Sim(net,pn); 

a = poststd(an,meant ,stdt) ; 

[m,b,r] = postreg(a't); 


p = stdpxpn + meanp 


premnmx，prepca，postmnmx，prestd 
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Premnmx 





PurPpose 


Syntax 


Descripfion 


Examples 


Algorithm 


See Also 
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Preprocess data so that minimum is -1 and maximum is 1 


[PN,minp,maxp,TN,mint,maxt] = premnmx(P,T) 
[PN,minp,maxp] = premnmx(P) 


premnmx preprocesses the network training set by normalizing the inputs and 
targets so that they fall in the interval [-1,1]. 


premnmx(P,T) takes these inputs， 
P - RXQmatrix ofinput (column) vectors. 
T - SXQmatrix oftarget vectors. 

and returns， 
PN - RXxQmatrxofnormnalized input vectors. 
minp- R X 1 vector containing minimums for each P， 
maxp- R X 1 vector containing maximums for each P， 
TN - SxXxQmatrix ofnormalized target vectors. 
mint- S X 1 vector containing minimums for each T， 


maxt- S X 1 vector containing maximums for each T， 


Here is the code to normalize a given data set so that the inputs and targets 
will fall in the range [-1，,1]. 


p= [-10 -7.5 -5 -2.502.557.5 10]; 
ttL= [07.07 -10 -7.07 07.07 10 7.07 0] 
[pn,minp,maxp,tn,mint,maxt] = premnmx(p ,七 ) ; 


HIyou just want to normalize the input， 


[pn,minp,maxp] = premnmx(p) ; 
pn=2x*(p-minp)/(maxp-minp) - 1 


prestd，prepca，postmnmx 


Prepca 





Purpose 
Synftax 


Descripfion 


Examples 


Algorithm 


Principal component analysigs 
[ptranstransMat] = prepca(P,min frac ) 


prepca preprocessegs the network input training set by applying a principal 
component analysis. This analysis transforms the input data so that the 
elements ofthe input vector set will be uncorrelated. In addqition, the size ofthe 
input vectors may be reduced by retaining only those components which 
contribute more than a specified fraction (min_frac) ofthe total variation in 
the data set. 


prepca(P,min_ frac) takes these inputs 


P - RXQmatrix ofcentered input (column) vectors, 
min_frac - Minimum 人 faction variance component to keep. 


and returnsgs 


ptrans - Transformed data set. 
transMat - Transformation matrix， 


Here is the code to perform a principal component analysis and retain only 
those components that contribute more than two percent to the variance in the 
data set. prestd is called first to create zero mean data, which is needed for 
prepca. 


p=[-1.5 -0.58 0.21 -0.96 -0.79j -2.2 -0.87 0.31 -1.4 -1.2]; 
[pn,meanp,stdp] = prestd(p) ; 
[ptrans,transMat] = prepca(pn,0.02) ; 


Since the secondrow ofpis almost amultiple ofthe first row, this example will 
produce a transformed data set that contains only one row、. 


This routine uses singular value decomposition to compute the principal 
components. The input vectors are multiplied by a matrix whose rows congsist 
of the eigenvectors ofthe input covariance matrix. This produces transformed 
input vectors whose components are uncorrelated and ordered according to the 
magnitude of their variance， 
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Those components that contribute only a small amount to the total variance in 
the data set are eliminated. It is assumed that the input data set has already 
been normalized so that it has a zero mean. The fanction prestd can be used 
to normalize the data. 

See Also prestd，premnmx 


References Jolliffe, IT., Przncipal Compomet 4A7alysis, New York: Springer-Verlag, 1986. 
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Purpose 


Synftax 


Descripfion 


Examples 


Algorithm 


See Also 


Preprocess data so that its mean is 0 and the standard deviation is 1 


[pn,meanp,stdp,tn,meant ,stdt] = prestd(p,t) 
[pn,meanp,stdp] = prestd(p) 
prestd preprocesses the network training set by normalizing the inputs and 
targets so that they have means of zero and standard deviationgs of 1. 
prestd(p,t) takes these inputs， 

p - RXQmatrix ofinput (colummn) Vectors. 


t - SXQmatrix oftarget vectors. 


and returns， 


pn - RXQmatrix ofnormalized input vectors. 

meanp - R X1vector containing mean for each P. 

stdp - R xX1vector containing standard deviations for each P. 
tn - S XQmatrix ofnormalized target vectors， 

meant - S X1vector containing mean for each T. 

stdt - S X1vector containing standard deviations for each T. 


Here is the code to normalize a given data set so that the inputs and targets 
will have means of zero and standard deviations of |. 


p= [-0.92 0.73 -0.47 0.74 0.29; -0.08 0.86 -0.67 -0.52 0.931]; 
t= [-0.083.4 -0.82 0.69 3.1]; 
[pn,meanp,stdp,tn,meant,stdt] = prestd(p,t) ; 


Ifyoujust want to normalize the input， 


[pn,meanp,stdp] = prestd(p) ; 
pn = (p-meanp)/stdp; 


premnmx，prepca 
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PurPpose 


Graph and 
Symbol 


Syntax 


Descripfion 


Examples 


Neftwork Use 
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Linear transfer fanction 





QG = PU1el(1) 


Linear Transfer Function 


A = purelin(N) 
info = purelin(code) 


purelinis atransfer function. Transfer fanctions calculate a layer's output 
位 om its net input. 
purelin(N) takes one input， 


N - SXxQmatrizx ofnet input (column) vectors. 
and returns N. 


purelin(code) returns useful information for each code string: 


deriv -Name ofderivative function. 
mame' - Full name. 

"output' - Output range. 

"active' - Active input range. 


Here is the code to create aplot ofthe purelin transfer fonction. 


n= -5:0.1:5; 
a = purelin(n); 
plot(nyal) 


You can create a standard network that uses purelin by calling newlin or 
new]lind. 


Purelin 





To change anetworksoalayeruses purelin,setnet.layers{fi}.transferFcn 
to 'purelLin'. 


In either case, call simto simulate the network with purelin. See newlin or 
newlind for simulation examples. 


Algorithm purelin(n) = n 


See Also sim，dpurelin，satlin，satlins 
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PurPpose 
Syntax 


Descripfion 


Examples 
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Discretize values as multiples of a quantity 
quant (X;,Q) 


quant (X,Q) takes two inputs， 
X - Matrix, vector or Scalar. 
Q - Minimum value. 


and returns values in X rounded to nearest multiple of Q. 


x = [1.333 4.756 -3.897] 
yY = quant(x;0.1) 


radbas 





Purpose 


Graph and 
Symbol 


Syntax 


Descripfion 


Examples 


Network Use 


Radial basis transfer function 






0 
-0.833 +0.833 
QG = 71OdDas(11) 


Radial Basis Function 


A = radbas(N) 
info = radbas(code) 


radbas js atransferfunction.Transfer functiongs calculate alayers output 位 om 
its net input. 
radbas (N) takes one input， 


N - SXxQmatrix ofnet input (column) vectors. 
and returns each element of N passed through a radial basis function. 
radbas (code) returns useful information for each code string: 
deriv -Name ofderivative fanction. 
mame'” -下 ull name. 
"output' - Output range. 
active' - Active input range. 


Here we create a plot ofthe radbas transfer function. 


n = -5:0.1:5; 
a= radbas(n) 
plot(ny,al) 


You can create a standard network that uses radbas by calling newpnn or 
newgrnn. 
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To change a network so that a layer uses radbas, Set 
net.1Layers{fi}y.transferFcn to radbas . 


In either case, call simto simulate the network with radbas. See newpnn or 
newgrnn for simulation examples. 


Algorithm radbas (N) calculates its output as: 


a = exp(-n2) 


See Also sim，tribas，dradbas 
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Purpose 


Syntax 


Descripfion 


Examples 


See Also 


Normalized column weight initialization function 


W= randnc(S,PR) 
W randnc(S,R) 


randnc is a weight initialization fanction. 
randnc(S,P) takes two inputs， 


S - Number ofrows (neurons), 
PR - RX2matrzxofinputvalue ranges= [Pmin Pmax ]. 


andreturns an SXRrandom matrix with normalized columns. 


Can also be called as randnc(S,R). 


Arandom matrix of four normalized three-element columns is generated: 


M= randnc(3),4) 


M = 
0.6007 0.4715 0.2724 0.5596 
0.7628 0.6967 0.9172 0.7819 
0.2395 0.5406 0.2907 0.2747 

randnr 
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PurPpose Normalized row weight initialization function 
Syntax W = randnr(S,PR) 
W= randnr(S,R) 


Description randnr is a weight initialization function. 
randnr(S,PR) takes two inputs， 


S - Number of rows (neurons). 
PR - RX2matrzxofinput valueranges = [Pmin Pmax]. 


andreturns an SXRrandom matrix with normalized rows. 


Can also be called as randnr(S,R). 


Examples Amatrix of three normalized four-element rows is generated: 
M= randnr(3),4) 
M = 
0.9713 0.0800 0.1838 0.1282 
0.8228 0.0338 0.1797 0.5381 
0.3042 0.5725 0.5436 0.5331 
See Also randnc 
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Purpose 


Synftax 


Descripfion 


Examples 


Network Use 


See Also 


Symmetric random weight/bias initialization fanction 


W= rands(S,PR) 
M rands(S,R) 
v= rands(S); 


rands is a weight/bias initialization fanction. 
rands(S,PR) takes， 

S - Number ofneurons. 

PR - RX2matrzxofRinput ranges. 


and returns an S-by-R weight matrix ofrandom values between -1 and 1. 


rands(S,R) returns an S-by-R matrix ofrandom values. rands(S) returns an 
S-by-l vector of random values. 


Here three sets ofrandom values are generated with rands. 


rands(4,[0 1; -2 2]) 
rands(4) 
rands(2,3) 


To prepare the weights and the bias of layer 1i of a custom mnetwork to be 
initialized with rands: 


1 Set net.initFcn to 'initlay'. (net.initParam will automatically become 
initlay's default parameters.) 
2 Set net.1ayers{fi}.initFcnto 'initwb'. 


3 Set each net.inputWeights{fi,j}.initFcnto'rands'. Set each 
net.1layerWeights{fi,j}.initFcnto'rands'. Set each 
net.biases{fi}.initFcnto'rands'. 


To initialize the network call init. 


randnr，randnc，initwb，initlay，init 
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PurPpose 
Syntax 


Descripfion 


Examples 


See Also 
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Random layer topology fuanction 

pos = randtop(dim1 ,dim2,，...，,dimN) 

randtop calculates theneuron positions for layers whose neurons are arranged 
in an N dimensional random pattern , 

randtop(dim1 ,dim2,...,dimN))takes N arguments， 


dimi - Length oflayer in qimension 1 


and returns an NXS matrix of N coordinate vectors, where S is the product of 
dim1*dim2*...*dimN. 


This code creates and displays a two-dimensional layer with 192 neuronsgs 
arranged in a 16-by-12 random pattern. 


pos = randtop(16,12); plotsom(pos ) 


This code plots the connections between the same neurons, but shows each 
neuron atthe location ofits weight vector. The weights are generated randomjly 
so that the layer is very unorganized, as is evident in the plot. 


W= rands(192,2); plotsom(W,dist(pos) ) 


gridtop，hextop 
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Purpose 
Synftax 


Descripfion 


Examples 


See Also 


Change network weights and biases to previous initialization values 
net = revert(net ) 
revert (net) returns neural network net with weight and bias values 


restored to the values generated the last time the network was initialized. 


Ifthe network has been altered so that it has different weight and bias 
connections or different input or layer Sizes, then revert cannot set the 
weights and biases to their previous values and they will be set to zeros 
instead. 


Here a perceptron is created with a two-element input (with ranges of 0 to 二 ， 
and -2 to 2) and one neuron. Once it is created we can display the neuron”s 
weights and bias. 


net = newp([0 1;-2 2],1); 

The initial network has weights and biases with zero values， 
net.iw{1,1}+，net.b{1} 

We can change these values as follows. 


net.iw{1,1} = [1 2]; 
net.b{1} = 5 
net.iw{1,1}+，net.b{1} 


We can recover the network's initial values as followsgs. 


net = revert(net) ; 
net.iw{1,1}+，net.b{1} 


init，SsSim，adapt，train. 
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PurPpose Saturating linear trangsfer function 


Graph and 
Symbol 





Q =SGQ1HIP(D) 


Satlin Transfer Function 


Syntax A = satlin(N) 
info = Satlin(code) 


Description satlinisatransferfunction.Transfer functions calculate alayers output from 
its net input. 
satlin(N) takes one input， 
N - SXxQmatrizx ofnet input (column) vectors， 
and returns values of N truncated into the interval [-1，1]. 
satlin(code) returns useful information for each code string: 


"deriv -Name of derivative function. 
mame'” -ull name. 

"output' - Output range. 

"active' - Active input range. 


Examples Here is the code to create aplot ofthe satlin transfer function. 


n= -5:0.1:5; 
a = Satlin(n); 
plot(ny,al) 


Network Use To change anetwork so that a layer uses sat1Lin， set 
net.1ayers{i}.transferFcn to 'sat1in'. 
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Call sim to simulate the network with satlin. See newhop for simulation 


examples. 
Algorithm satlin(n) = 0,，ifn<=0;nif0ox<=n<=1; 1，if 1<= n. 
See Also sim，poslin，satlins，purelin 
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PurPpose Symmetric saturating linear transfer function 


Graph and 
Symbol 








Q = SGQ1IS(1) 


Satlins Transfer Function 


Syntax A = satlins(N) 
info = Satlins(code) 


Description satlins is a transfer function. Transfer functions calculate a layer's output 
位 om its net input. 
satlins(N) takes one input， 


N - SXxQmatrizx ofnet input (column) vectors. 
and returns values of N truncated into the interval [-1，1]. 
satlins(code) returns useful information for each code string: 
deriv' -Name of derivative function. 
mame'” - 了 ull name. 
"output' - Output range. 
"active' - Active input range. 


Examples Here is the code to create a plot ofthe satlLins transfer function. 


n= -5:0.1:5; 
a = Satlins(n); 
plot(nyal) 


Nefwork Use You can create a standard network that uses sat1lins by calling newhop. 
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To change a network so that a layer uses sat1ins, set 
net.1Layers{i}.transferFcnto 'sat1ins'. 


In either case, call simto simulate the network with sat1lins. See newhop for 
Simulation examples. 


Algorithm satlins(n) = -1，ifn<= -1ij nif -1<=n<=1;)1，if1<=n. 


See Also sim，satlin，poslin，purelin 
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PurPpose 
Syntax 


Descripfion 


Examples 


See Also 
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Convert sequential vectors to concurrent vectors 
b = Sedq2con(S) 
The Neural Network Toolbox represents batches of vectors with a matrix, and 


Sequences of vectors with multiple columns of a cell array. 


sedq2con and con2seq allow concurrent vectors to be converted to sequential 
vectors, and back again. 


sedq2con(S) takes one input， 


S - N XTScell array ofmatrices with M columns. 


and returns， 


B - NX1cell array ofmatrices with M*TS columns， 


Here three sequential values are converted to concurTrent values, 


p1 {1 4 2} 
p2 = Seq2con(p1) 


Here two sequences ofvectors overthree time steps are converted to concurrent 
Vectors，. 


p1={[i 1] [5;4] [1 2]; [3;9] [4; 1] [9; 8])} 
p2 Seq2con(p1) 


Con2seq，Cconcur 
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Purpose 
Synftax 


Descripfion 


Examples 


See Also 


Set all network weight and bias values with a single vector 
net = Setx(net,X) 


This function sets a networks weight and biases to a vector of values. 
net = Setx(net,X) 

net - Neural network. 

X “ - Vector of weight and bias values. 


Here we create a network with a two-element input, and one layer of three 
neurons. 
net = newff([0 1; -1 1],[3]); 


The network has Six weights (3 neurons*# 2 input elements) and three biases 
(3 neurons) for a total ofnine weight and bias values. We can set them to 
random values asgs followsgs: 


net = Setx(netyrand(9,1) ) 
We can then view the weight and bias values as follows: 


net.Iw{1,1} 
net.b{1} 


getXx, formx 
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PurPpose Simulate a neural network 


Syntax [Y,Pf,Af,E,perf] = sim(net,P,Pi,Ai,T) 
[Y,Pf,Af,E,perf] = Sim(net,{Q TS},Pi,AI,T) 
[Y,Pf,Af,E,perf] = Sim(net,Q,Pi,Ai,T) 


To Gef Help Type help network/sinm 


Descripfion sim simulates neural networks. 
[Y,Pf,Af,E,perf] = sim(net,P,Pi,AI,T) takes， 


net - Network. 


P  - Network inputs, 

Pi - Initial input delay conditions, default = zeros. 
Ai - Initial layer delay condqitions, default = zeros. 
T ， - Network targets, default = zeros. 


and returns， 


Y “ - Network outputs. 

Pf - Final input delay conditions， 

Af  - Final layer delay conditions, 

E ， - Network errors. 

perf- Network performance. 
Note that arguments Pi, Ai, Pf, and Af are optional and need only be used for 
networks that have input or layer delays. 


sims Signal arguments can have two formats: cell array or matrix. 
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The cell array format is easiest to describe. It is most convenient for networks 
with multiple inputs and outputs, and allows sequences of inputs to be 
presented: 
P - Ni XTScell array, each element P{i,ts}y is an RixQmatrix. 
Pi - Ni XIDcell array, each element Pi{fi,klyis an RiXxQmatrix， 
Ai - N1 XLDcell array, each element Ai{fi,k}y is an SixQmatrix. 
- Nt XTS cell array, each element P{i,tsjis an Vi XQmatrix. 
- NO XTS cell array, each element Y{fi,ts}y is aUixQmatrix. 
Pf - Ni XIDcell array, each element Pf{i,k}y is an RIiXQmatrix. 
Af - N1 XLDcell array, each element Af{i,k}y is an SixQmatrix. 
E - Nt XTScell array, each element P{i,ts} is an Vi XQmatrix. 


NI = net.numInputs 

N1L = net.numLayers 

No = net.numoutputs 

D = net.numInputDelays 
LD = net.numLayerDelays 
TS = Number oftime steps 
Q = Batch size 

Ri =net.inputs{I}.Size 
Si = net.1Layers{i}.Size 
UL =net.outputs{i}.Size 


The columns of Pi, Ai, Pf, and Af are ordered fom oldest qdqelay condition to 
Imost recent: 


Pi{fi,k}=inputiattmets=k-ID. 
Pf{i,k} =inputiattmets=TS+k-ID. 
Ai{fi,k} =]layer output iattme ts=k-LD. 
Af{i,k} =]layer output iattme ts=TS+k-LD. 
The matrix format can be used ifonly one time step is to be simulated 


(TS = 1).Itis convenient for networks with only one input and output, but can 
also be used with networks that have more. 
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Examples 
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了 ach matrix argument is found by storing the elements of the corresponding 
cell array argument into a single matrix: 


Pf - 
Af - 
E - (Sum of 


ii) X (IDxQ) matrixX. 
X (LDxQ) matTixX. 


I)XQ matTrix. 


P - (sum of RiI) XQmatrix 
Pi - (sum of Ri) X (IDxQ) matTixX. 
Ai - (Sum of Si) X (LDxQ) matTixX. 
T - (sum of Vi)XQmatrix 
- ( 1I) X Q matrix. 
、 


工 


Ri) 
Ri) 
Si) 
Vi) 
Sum of UiI) 
Ri) 
Si) 
VI) 


[Y,Pf,Af] = sim(net,{Q TS},Pi,Ai) is used for networks which do not have 
an input, such as Hopfield networks, when cell array notation is used. 


Here newp is used to create a perceptron layer with a two-element input (with 
ranges of [0 1]), and a single neuron. 


net = newp([0 1;0 1] ,1); 
Here the perceptron is simulated for an individual vector, a batch of three 
Vectors, and a Sequence of three vectors. 


p1=[.2; .9]; al = Simn(net,p1l) 
p2=[.2.5.1; .9.3.7];j a2= Sim(net,，p2) 
{[.2; .9] [.5; .3] [.1; .7]}; ag = Sim(net,p3) 


万 
C 
有 


Here newlind is used to create a linear layer with a three-element input, two 
neurons. 


net = newlin([0 2;0 2;0 21,2,[0 1]); 
Here the linear layer is Simulated with a sequence oftwo input vectors using 
the default initial input delay conditions (all zeros). 

p1 =({[2; 0.5; 1] [1; 1.2; 0.1]}; 

[y1,pf] = Sim(net,p1) 
Here the layer is simulated for three more vectors using the previous final 


input delay conditions as the new initial delay conditions. 


p2 = {[0.5; 0.6; 1.8] [1.3; 1.6; 1.1] [0.2; 0.1; 0]1; 


SI 





Algorithm 


[y2,pf] = Sim(net,p2,pf) 


Here newelmis used to create an 了 上 lman network with a one-element input, and 
alayer 1 with three tansig neurons followed by alayer 2 with two purelin 
neurons. Because it is an Elman network ithas atap delay line with a delay of 
1 going 位 om layer 1 to layer |. 


net = newelm([0 1],[3 2],{ tansig ，purelin' }) 


Here the 了 JIman network is simulated for a sequence of three values using 
default initial delay conditions. 


p1 = {0.2 0.7 0.11}; 
[y1,pf,af]l = Sim(net,p1) 


Here the network is Simulated for four more values, using the previous final 
delay conditiongs as the new initial delay conditions. 


p2={0.10.90.8 0.4}; 
[y2,pf,af] = Sim(net,p2,pf,af) 


sim uses these properties to simulate a network net. 


net .numInputs，net,.numLayers 
net.outputCconnect，net.biasConnect 
net.inputConnect ，net.1LIayerConnect 


These properties determine the network's weight and bias values, and the 
number of delays associated with each weight: 


net.IW{I ,jl} 

net.LW{I ,jl} 

net.b{i1} 
net.inputWeights{Ii,j}.delays 
net.1ayerWeights{i,j}.delays 


These function properties indicate how sim applies weight and bias values to 
inputs to get each layers output: 


net.inputWeights{1I,，j}.weightFcn 
net.1ayerWeights{i,，j}.weightFcn 
net.1ayers{Ii}.netInputFcn 
net.1ayers{Ii}.transferFcn 
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See Chapter 2 for more information on network Simulation. 


See Also init，adapt，train，revert 
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Purpose 


Graph and 
Symbol 


Syntax 


Descripfion 


Examples 


Soft max trangsfer function 


Input n Output a 





0 1 0.5 0.17 0.46 0.1 0.28 
a= So1tbmax(m) 
Softmax Transfer Function 


A = Softmax(N) 


info = Softmax(code) 


Softmax is a trangsfer function. Transfer fanctions calculate a layer's output 
位 om its net input. 


Softmax(N) takes one input argument， 


N - SXxQmatrix ofnet input (column) vectors. 
and returns output vectors with elements between 0 and 1, but with their Size 
relationgs intact. 
softmax('code' ) returns information about this fanction. 


These codes are defined: 


"deriv -Name ofderivative fanction. 
mame'” -ull name. 
"output' - Output range. 
"active' - Active input range. 
compet does not have a derivative function. 


Here we define a net input vector N, calculate the output, and plot both with 
bar graphs. 

n= [0; 1; -0.5; 0.5] 

a= Softmax(n); 

Subplot(2,1,1)，bar(n)，ylabel(n' ) 
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Subplot(2,1,2)，bar(a)，y1label('a') 


Network Use To change a network so that a layer uses Softmax, Set 
net.1ayers{i,j}.transferFcn to 'Ssoftmax'. 


Call sim to simnulate the network with softmax. See newc or newpnn for 
Simulation examples. 


See Also sim，compet 
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Purpose 


Syntax 


Descripfion 


One-dimensional minimization using backtracking 


[a,gXx,perf,retcode,deltatol] = 
srchbac (net,X,Pd,T1,Ai;,Q,TS,dXx,gx,perf,dperf,delta,TOL,ch_perf) 


srchbac is alinear search routine. It searches in a given direction to locate the 
minimum ofthe performance function in that direction. It uses a technique 
called backtracking. 


srchbac (net,X,Pd,T1,Ai,Q,TS,dXx,gx,perf,dperf,delta,TOL,ch_perf) 
takes these inputs， 


net 


gX 


- Neural network. 

- Vector containing current values of weights and biases. 
- Delayed input vectors. 

- Layer target vectors. 

- Initial input delay conditions. 

- Batch size. 

- Time steps， 

-_ Search qirection Vector. 

-_ Gradient vector. 


perf - _ Performance value at current X. 


dperf - Slope of performance value at current X in direction of dxX. 


delta - _ Initial step Size. 


tol -_ Tolerance on search. 


ch_perf - Change in performance on previous step. 


and returns， 


a - Step size, which minimizes performance，. 


gx - Gradient at new minimum point. 


perf - _ Performance value at new minimum point. 


retcode - _ Return code which has three elements. The first two elements 
correspond to the number of function evaluationgs in the two stages ofthe 
Search. The third element is a return code. These will have different 
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Examples 


14-234 


meanings for different search algorithms. Some may not be used in this 
fanction ， 


0 - normal; 1 - minimum step taken; 


2 - maximum step taken; 3 - beta condition not met. 


delta - Nevw initial step size. Based on the current step Size. 


tol - New tolerance on Search. 


Parameters used for the backstepping algorithm are: 


alpha 
beta 


Low_ lim - 


Up_1i 


maxSstep - 
miznstep - 


篆 


Scale factor that determines sufficient reduction in perf. 
Scale factor that determines sufficiently large step Size. 
Lower limit on change in step size. 

Upper limit on change in step size. 

Maximum step length. 

Minimum step length. 


scale tol -_ Parameter which relates the tolerance tol to the initial step 
Size delta. Usually set to 20. 


The defaults for these parameters are set in the training fanction that calls it. 
See traincgf，traincgb，traincgp，trainbfg，trainoss . 


Dimensionsgs for these variables are: 


Dij = 


No XNiXTScell array, each element P{i,j,ts}isaDijxQmatrix. 


N1 XTS cell array, each element P{i,ts}yis anVixQmatrix. 


N1 XLD cell array, each element Ai{fi,klyis an SixQmatrix. 


.numInputs 
.numLayers 
.numLayerDe1lays 
.Inputs{Ii}.Size 
.ayers{i}.sSize 


targets{i}.Size 


Li * Jength(net,.inputWeights{Ii,j}.delays) 


Here is a problem consisting ofinputs p and targets t that we would like to 
solve with a network. 
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Network Use 


P = 


[012345]; 
t= [0001114] 


了 


Here a two-layer feed-forward network is created. The network's input ranges 
位 om [0 to 10]. The first layer has two tansigmneurons, and the second layer 
has one 1ogsig neuron. The traincgf network training function and the 
srchbac search function are to be used. 


Create and Test a Network 


net = newff([0 5],[2 1],{tansig ，1ogsig }，traincgf  ) ; 
a= Simn(net,p) 


Train and Retest the Network 
net.trainParam.SearchFcn = ' Srchbac ; 
net.trainParam.epochs = 50; 
net .trainParam.Sshow = 10; 
net.trainParam.goal = 0.1; 
net = tralin(net,p ,七 ) ; 
a= Simn(net,p) 


You can create a standard network that uses srchbac with newff, newcf, or 
newelnm. 


To prepare acustom network to be trained with traincgf,using the line search 
fanction srchbac: 


1 Setnet.trainFcnto'traincgf'.This will set net.trainParamto traincgf”s 
default parameters, 


2 Set net.trainParam.searchFcn to 'Ssrchbac '. 


The srchbac function can be used with any ofthe following training functions: 
traincgf, traincgb, traincgp, trainbfg, trainoss ， 
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Algorithm 


See Also 


References 


14-236 


srchbac locates the minimum ofthe performance function in the search 
direction dX, using the backtracking algorithm described on page 126 and 328 
of Dennis and Schnabels book noted below. 


srchcha，srchgo1l，srchhyb 


Dennis, J. 卫 ., and R. B. Schnabel, Nz7ae7zcal Metpoas 1jor UncomstyrazeQ 
Op 万 172atio7 QQ VolI7ear 五 gxatios, Englewood Cliffs, NJ: Prentice-Hall， 
1983. 


srchbre 





Purpose 


Syntax 


Descripfion 


One-dimensional interval location using Brents method 


[a,gx,perf,retcode,deltatol] = 
Srchbre(net,X,Pd,T1,Ai,Q,TS,dXx,gx,perf,dperf,delta,tol,ch_perf) 


srchbre is alinear search routine. It searches in a given direction to locate the 
minimum ofthe performance function in that direction. It uses a technique 
called Brent's technique. 


srchbre(net,X,Pd,T1,Ai,Q,TS,dXx,gx,perf,dperf,delta,tol,ch_perf) 
takes these inputs， 


net 


gX 


- Neural network. 

- Vector containing current values of weights and biases. 
- Delayed input vectors. 

- Layer target vectors. 

- Initial input delay conditions. 

- Batch size. 

- Time steps， 

-_ Search qirection Vector. 

-_ Gradient vector. 


perf - _ Performance value at current X. 


dperf - Slope of performance value at current X in direction of dxX. 


delta - _ Initial step Size. 


tol - Tolerance on search. 


ch_perf - Change in performance on previous step. 


and returns， 


a - Step size, which minimizes performance， 


gx - Gradient at new minimunm point. 


perf - _ Performance value at new minimum point. 


retcode - _ Return code, which has three elements. The first two elements 
correspond to the number of function evaluationgs in the two stages ofthe 
search. The thirdqd element is a return code. These will have different 
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meanings for different search algorithms. Some may not be used in this 
fanction . 


0 - normal; 1 - minimum step taken; 


2 - maximum step taken; 3 - beta condition not met. 


delta - Nevw initial step size. Based on the current step size. 
tol - Nevw tolerance on search. 
Parameters used for the brent algorithm are: 


alpha - Scale factor, which determines sufficient reduction in perf. 

beta  - Scale factor, which determines sufficiently large step size. 

bmax “ - Largest step slze. 

scale tol - _ Parameter which relates the tolerance tol to the initial step 
Size delta. Usually set to 20. 


The defaults for these parameters are set in the training function that cajlls it. 
See traincgf，traincgb，traincgp，trainbfg，trainoss . 


Dimensionsgs for these variables are: 


Pd - No XNiXTScell array, each element P{i,j,ts}yis aDijxQmatrix. 
T1 - NL XTS cell array, each element P{i,ts}yis anVixQmatrix. 
Ai - N1 XLDcell array, each element Ai{fi,kl is an SixQmatrix. 


NI = net.numInputs 

N1L = net.numLayers 

LD = net.numLayerDelays 

RiI = net.inputs{i}.sSize 

Si = net.1Layers{fi}.Size 

Vi = net.targets{i}.Ssize 

Dij = Ri * length(net.inputWeights{i,j}.delays) 


Examples Here is a problem consisting ofinputs p and targets t that we would like to 
Solve with a networkK. 


p= [012345]; 
t= [000111]; 
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Network Use 


Algorithm 


See Also 


References 


Here a two-layer feed-forward network is created. The network's input ranges 
位 om [0 to 10]. The first layer has two tansigmneurons, and the second layer 
has one 1ogsig neuron. The traincgf network training function and the 
srchbac search function are to be used. 


Create and Test a Network 


net = newff([0 5],[2 1],{tansig ，1ogsig' }，traincgf ) ; 
a= Simn(net,p) 


Train and Retest the Network 
net.trainParam.SearchFcn = ' Srchbre ; 
net.trainParam.epochs = 50; 
net .trainParam.Show = 10; 
net.trainParam.goal = 0.1; 
net = train(net,p ,七 ) ; 
a= Simn(net,p) 


You can create a standard network that uses srchbre with newff, newcf, or 
newelnm. 


To prepare acustom network to be trained with traincgf, using the line search 
fanction srchbre: 


1 Setnet.trainFcnto'traincgf'.This will set net.trainParamto traincgf”S 
default parameters, 


2 Set net.trainParam.searchFcn to 'Ssrchbre'. 


The srchbre function can be used with any ofthe following training functions: 
traincgf, traincgb, traincgp, trainbfg, trainoss ， 


srchbre brackets the minimum ofthe performance function in the search 
direction dX, using Brent's algorithm described on page 46 of Scales (see 
reference below). It is a hybrid algorithm based on the golden section search 
and the quadratic approximation. 


Srchbac，Ssrchcha，srchgo1l，srchhyb 


Scales, 工 . 卫 .,，77atroqQzxctio7 to Nom-Linear Optazzat 如 oz New York: 
Springer-Verlag, 1985. 
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Purpose One-dimensional minimization using Charalambous method 


Syntax [a,gXx,perf,retcode,deltatol] = 
Srchcha(net,X,Pd,T1,Ai,Q,TS,dXx,gx,perf,dperf,deltatol,ch_perf) 


Description srchchalis alinear search routine. It searches in a given direction to locate the 
minimum ofthe performance function in that qirection. It uses a technique 
based on Charalambous' method. 


Srchcha(net,X,Pd,T1,Ai,Q,TS,dXx,gxperf,dperf,deltatol,ch_perTf) 
takes these inputs， 

net - Neural network. 

X “- Vector containing current values of weights and biases. 

Pd  - Delayed input vectors. 

T1  - Layer target vectors. 

Ai - Initial input delay conditions. 

Q “ - Batch size. 

TS - Time steps. 

dX  - Search qirection vector. 

gx  - Gradient vector. 

perf - _ Performance value at current X. 

dperf - Slope ofperformance value at current X in direction of dxX. 

delta - _ Initial step Size. 

tol -_ Tolerance on search. 

ch_perf - Change in performance on previous step, 


and returns， 


a - Step size, which minimizes performance. 
gx - Gradient at new minimum point. 
perf - _ Performance value at new minimum point. 


retcode - _ Return code, which has three elements. The first two elements 
correspond to the number of function evaluations in the two stages of the 
Search. The third element is a return code. These will have different 
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meanings for different search algorithms. Some may not be used in this 
fanction 


0 - normal; 1 - minimum step taken; 


2 - maximum step taken; 3 - beta condition not met. 


delta - New initial step size. Based on the current step size. 
tol - Nevw tolerance on search. 


Parameters used for the Charalambous algorithm are: 


alpha - Scale factor, which determines sufficient reduction in perf. 

beta  - Scale factor, which determines sufficiently large step size. 

gama  - Parameterto avoid small reductions in performance. Usually set to 0.1. 
scale tol - _ Parameter, which relates the tolerance tol to the initial step 
Size delta. Usuajlly set to 20. 


The defaults for these parameters are set in the training function that calls it. 
See traincgf, traincgb, traincgp, trainbfg, trainoss . 


Dimensions for these variables are: 


Pd - No XNiXTScell array, each element P{i,j,ts}yis a Dij xQmatrix. 
T1 - NL XTScell array, each element P{i,ts}yis anVixQmatrix. 
Ai - N1 XLDcell array, each element Ai{fi,k}yis an SixQmatrix. 


NI = net.numInputs 

N1L = net.numLayers 

LD = net.numLayerDelays 

RiI = net.inputs{i}.Size 

Si = net.1Layers{fi}.Size 

Vi = net.targets{fi}.Size 

Dij = Ri * length(net.inputWeights{i,j}.delays) 


Examples Here is a problem consisting ofinputs p and targets t that we would like to 
Solve with a networkK. 


p=[012345] 
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Algorithm 
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t=[000111]; 


Here a two-layer feed-forward network is created. The network's input ranges 
位 om [0 to 10]. The first layer has two tansigneurons, and the second layer 
has one 1ogsig neuron. The traincgf network training fanction and the 
srchcha search fanction are to be used. 


Create and Test a Network 


net = newff([0 5],[2 1],{ tansig ，1ogsig +，traincgf  ) ; 
a= Sim(net,p) 


Train and Retest the Network 
net.trainParam.SearchFcn = "Srchcha ; 
net.trainParam.epochs = 50 
net.trainParam.Show = 10; 
net.trainParam.goal = 0.1; 
net = train(net,p,t); 
a= Sim(net,p) 


You can create a standard network that uses srchcha with newff, newcf, or 
newelnm. 


To prepare a custom network to betrained with traincgf,usingtheline search 
fanction srchchal: 


1 Setnet.trainFcnto'traincgf'.This will set net.trainParamto traincgf”S 
default parameters. 


2 Set net.trainParam.searchFcn to 'Ssrchcha'. 


The srchcha function can be used with any ofthe following training functions: 
traincgf, traincgb, traincgp, trainbfg, trainoss ， 


srchcha locates the minimum ofthe performance function in the search 
direction dX, using an algorithm based on the method described in 


Charalambous (see reference below). 


Srchbac，Ssrchbre，Ssrchgo1l，srchhyb 


srchcha 





References Charalambous, C. “Conjugate gradient algorithm for efficient training of 
artificial neural networks,”7 鳌 有 下 Proceedi1lss, vol. 139, no. 3, pp. 301-310， 
June 1992. 
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PurPpose One-dimensional minimization using golden section search 


Syntax [a,gXx,perf,retcode ,deltatol] = 
Srchgol(net,X,Pd,T1,Ai,Q,TS,dXx,gx,perf,dperf,delta,tol,ch_perf) 


Description srchgol is alinear search routine. It searches in a given direction to locate the 
minimum ofthe performance function in that direction. It uses a technique 
called the golden section search. 


Srchgol(net,X,Pd,T1,Ai,Q,TS,dXx,gx,perf,dperf ,deltatol,ch_perTf) 
takes these inputs， 

net - Neural network. 

X “- Vector containing current values of weights and biases. 

Pd  - Delayed input vectors. 

T1  - Layer target vectors. 

Ai - Initial input delay conditions. 

Q “ - Batch size. 

TS - Time steps. 

dX  - Search qirection vector. 

gx  - Gradient vector. 

perf - _ Performance value at current X. 

dperf - Slope ofperformance value at current X in direction of dxX. 

delta - _ Initial step Size. 

tol -_ Tolerance on search. 

ch_perf - Change in performance on previous step, 


and returns， 


a - Step size, which minimizes performance. 
gx - Gradient at new minimum point. 
perf - _ Performance value at new minimum point. 


retcode - _ Return code, which has three elements. The first two elements 
correspond to the number of function evaluations in the two stages of the 
Search. The third element is a return code. These will have different 
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meanings for different search algorithms. Some may not be used in this 
fanction 


0 - normal; 1 - minimum step taken; 


2 - maximum step taken; 3 - beta condition not met. 


delta - New initial step size. Based on the current step size. 
tol - Nevw tolerance on search. 


Parameters used for the golden section algorithm are: 


alpha - Scale factor, which determines sufficient reduction in perf. 

bmax “ - Largest step slze. 

Scale _ tol - _ Parameter, which relates the tolerance tol to the initial step 
Size delta. Usually set to 20. 


The defaults for these parameters are set in the training function that calls t. 
See traincgf，traincgb，traincgp，trainbfg，trainoss . 


Dimensionsgs for these variables are: 


Pd - No XNiXTScell array, each element P{i,j,ts}yis a DijxQmatrix. 
T1 - NL XTScell array, each element P{i,ts}yis anVixQmatrix. 
Ai - N1 XLDcell array, each element Ai{fi,k}y is an SixQmatrix. 


NI = net.numInputs 

N1L = net.numLayers 

LD = net.numLayerDelays 

RiI = net.inputs{i}.sSize 

Si = net.1Layers{i}.Size 

Vi = net.targets{fi}.Size 

Dij = Ri * length(net.inputWeights{i,j}.delays) 


Examples Here is a problem consisting ofinputs p and targets t that we would like to 
solve with a networkK. 


p=[012345] 
t= [000111]; 


14-245 


srchgol 





Neftwork Use 


Algorithm 


See Also 


References 


14-246 


Here a two-layer feed-forward network is created. The network's input ranges 
位 om [0 to 10]. The first layer has two tansigneurons, and the second layer 
has one 1ogsig neuron. The traincgf network training fanction and the 
srchgol search fanction are to be used. 


Create and Test a Network 


net = newff([0 5],[2 1],{ tansig ，1ogsig +，traincgf  ) ; 
a= Sim(net,p) 


Train and Retest the Network 
net.trainParam.SearchFcn = "Srchgol ; 
net.trainParam.epochs = 50 
net.trainParam.Show = 10; 
net.trainParam.goal = 0.1; 
net = train(net,p,t) ; 
a= Sim(net,p) 


You can create a standard network that uses srchgol with newff, newcf, or 
newelnm. 


To prepare a custom network to betrained with traincgf,usingthe line search 
fanction srchgol: 


1 Setnet.trainFcnto'traincgf'.This will set net.trainParamto traincgf”s 
default parameters, 


2 Setnet.trainParam.searchFcn to 'srchgol'. 


The srchgol function can be used with any ofthe following training functions: 
traincgf, traincgb, traincgp, trainbfg, trainoss. 


srchgol locates the minimum ofthe performance function in the search 
direction dX, using the golden section search. It is based on the algorithm as 
described on page 33 of Scales (see reference below). 


Srchbac，Ssrchbre，Ssrchcha，srchhyb 


Scales, 工 . 瑟 .，77at7roQzctio7 to Nom-Cinear Opti7atzzat 如 om, New York: 
Springer-Verlag, 1985. 
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Purpose 


Syntax 


Descripfion 


One-dimensional minimization using a hybrid bisection-cubic search 


[a,gx,perf,retcode,deltatol] = 
srchhyb (net,X,P,T,Q,TS,dXx,gx,perf,dperf ,deltatol,ch_perf) 


srchhyb is alinear search routine. It searches in a given direction to locate the 
minimum ofthe performance function in that direction. It uses a technique 
that is a combination of a bisection and a cubic interpolation . 


srchhyb (net,X,Pd,T1,Ai,Q,TS,dXx,gx,perf,dperf,delta,tol,ch_perf) 
takes these inputs， 


net 


gX 


- Neural network. 

- Vector containing current values of weights and biases. 
- Delayed input vectors. 

- Layer target vectors. 

- Initial input delay conditions. 

- Batch size. 

- Time steps， 

-_ Search qirection Vector. 

-_ Gradient vector. 


perf - _ Performance value at current X. 


dperf - Slope of performance value at current X in direction of dxX. 


delta - _ Initial step Size. 


tol - Tolerance on search. 


ch_perf - Change in performance on previous step. 


and returns， 


a - Step size, which minimizes performance， 


gx - Gradient at new minimunm point. 


perf - _ Performance value at new minimum point. 


retcode - _ Return code, which has three elements. The first two elements 
correspond to the number of function evaluationgs in the two stages ofthe 
search. The thirdqd element is a return code. These will have different 
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Examples 
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meanings for different search algorithms. Some may not be used in this 
fanction ， 


0 - normal; 1 - minimum step taken; 


2 - maximum step taken; 3 - beta condition not met. 


delta - Nevw initial step size. Based on the current step Size. 
tol - Nevw tolerance on search. 


Parameters used for the hybrid bisection-cubic algorithm are: 


alpha - Scale factor, which determines sufficient reduction in perf. 

beta  - Scale factor, which determines sufficiently large step size. 

bmax “ - Largest step slze. 

scale _tol - Parameter, which relates the tolerance tol to the initial step 
size delta. Usually set to 20. 


The defaults for these parameters are set in the training fanction that calls t. 
See traincgf, traincgb, traincgp, trainbfg, trainoss . 


Dimensionsgs for these variables are: 


Pd - No XNiXTScell array, each element P{i,j,ts}yis aDijxqQmatrix. 
T1 - NL XTS cell array, each element P{ii,ts}yis anVixQmatrix. 
Ai - N1L1 XLDcell array, each element Ai{fi,kl is an SixXxQmatrix. 


NI = net.numInputs 

N1L = net.numLayers 

LD = net.numLayerDelays 

RiI = net.inputs{i}.sSize 

Si = net.1Layers{fi}.Size 

Vi = net.targets{Ii}.size 

Dij = Ri * length(net.inputWeights{i,j}.delays) 


Here is a problem consisting of inputs p and targets t that we would like to 
solve with a network. 


p= [012345]; 


srchhyb 





Network Use 


Algorithm 


See Also 


t=[000111]; 


Here a two-layer feed-forward network is created. The network's input ranges 
位 om [0 to 10]. The first layer has two tansigneurons, and the second layer 
has one 1ogsig neuron. The traincgf network training function and the 
srchhyb search function are to be used. 


Create and Test a Network 


net = newff([0 5],[2 1],{tansig ，1ogsig }，traincgf  ) ; 
a= Simn(net,p) 


Train and Retest the Network 
net.trainParam.SearchFcn = 'Srchhyb ; 
net.trainParam.epochs = 50; 
net.trainParam.Sshow = 10; 
net.trainParam.goal = 0.1; 
net = train(net,p ,七 ) ; 
a= Simn(net,p) 


You can create a standard network that uses srchhyb with newff, newcf, or 
newelnm. 


To prepare acustom network to be trained with traincgf,using the line search 
fanction srchhyb: 


1 Setnet.trainFcnto'traincgf'.This will set net.trainParamto traincgf”s 
default parameters， 


2 Set net.trainParam.searchFcn to 'srchhyb '. 


The srchhyb function can be used with any ofthe following training functions: 
traincgf, traincgb, traincgp, trainbfg, trainoss ， 


srchhyb locates the minimum ofthe performance function in the search 
direction dx, using the hybrid bisection-cubic interpolation algorithm described 


on page 50 of Scales (see reference below). 


Srchbac，Ssrchbre，Ssrchcha，Ssrchgol 
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References Scales, 工 . 瑟 .，77at 杂 roQzctio7 to Nomn-Linear Opti7atzzat 刀 oz, New York: 
Springer-Verlag, 1985. 
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Purpose 


Synftax 


Descripfion 


Examples 


Sum squared error performance function 


perf = SSse(E,X,PP) 
perf = SSse(E,net,PP) 
info = SSse(code) 
sse is a network performance fanction. It measures performance according to 
the sum of squared erTrors. 
sse(E,X,PP) takes 位 om one to three arguments， 
E _ - Matrix or cell array of error vector(S). 


X - Vector of all weight and bias values (ignored). 
PP - _ Performance parameters (ignored). 


and returns the sum Squared erTor. 
sse(E,net,PP) can take an alternate argument to X， 
net - Neural network from which X can be obtained (ignored). 
sse(code) returns useful information for each code string: 
deriv' - Name of derivative function. 
mame' - Full name. 


"pnames' - Names of training parameters. 
"pdefaults' - Default training parameters. 


Here a two-layer feed-forward is created with a 1-element input ranging 位 om 
-10 to 10, four hidden tansig neurons, and one purelin output neuron. 


net = newff([-10 10],[4 1],{ 人 tansig ，purelin ' }); 


Here the network is given a batch of inputs P. The error is calculated by 
Subtracting the output A 位 om target T. Then the sum squared erTor ig 
calculated. 
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p= [-10.-50510]; 
t= [00111]; 

yY = Sim(net,p) 

e = t-y 


perf = Sse(e) 


Note that sse can be called with only one argument because the other 
arguments are ijgnored. sse Supports those arguments to conform to the 
standard performance function argument list.， 


To prepare a custom network to be trained with sse, set net.performFcn to 
"sse'.This will automatically set net.performParamtothe empty matrix [], as 
Sse has no performance parameters. 


Calling train or adapt will result in sse being used to calculate performance, 


dsse 


sumsdqr 





Purpose Sum squared elements of a matrix 

Syntax Sumsqr (m) 

Descripftion sumsqr(M) returns the sum ofthe squared elements in M. 
Examples s = Sumsqr([1 2;3 41]) 
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Purpose 


Graph and 
Symbol 


Syntax 


Descripfion 


Examples 
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了 yperbolic tangent sigmoid transfer function 





QG = 1011818(7) 


Tan-Sigmoid Transfer Function 


A = tansig(N) 
info = tansig(code) 


tansigis atransferfunction.Transfer fanctions calculate alayers output 他 om 
its net input. 


tansig(N) takes one input， 
N - SXxQmatrizx ofnet input (column) vectors. 
and returns each element of N squashed between -1 and 1. 
tansig(code) return useful information for each code string: 
deriv -Name ofderivative function. 
mame' - Full name. 
"output' - Output range. 
"active' - Active input range. 


tansig is named after the hyperbolic tangent, which has the same shape. 
However, tanh may be more accurate and is recommended for applications that 
require the hyperbolic tangent. 


Here is the code to create a plot ofthe tansig transfer fanction. 


n= -5:0.1:5; 
a = tansig(n); 
plot(ny,al) 
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Network Use 


Algorithm 


See Also 


References 


You can create a standard network thatuses tansig by calling newff or newcf. 


To change a network so a layer Uses tansig, set 
net.1Layers{i,j}.transferFcn to tansig . 


In either case, call simto simulate the network with tansig. See newff or 
newcf for simulation examples. 
tansig(N) calculates its output according to: 

n = 2/(1+exp(-2x*n))-1 


This is mathematically equivalent to tanh(N).It differs in that it runs faster 
than the MATLAB implementation oftanh, buttheresults can have very Small 
numerical differences. This function is a good trade offfor neural networks， 
where speed is imnportant and the exact shape of the transfer function is not， 


Slim，dtansig，10gsig 
Vogl,T.P.,J.K. Mangis, A.K. Rigler, W.T. Zink, and D.L. Alkon, “Accelerating 


the convergence of the backpropagation method,”Biolosrica! Cypermetzcs, vol. 
59, pp. 257-263, 1988. 
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PurPpose Train a neural network 

Syntax [net,tr, YE,Pf,Af] = tralin(net,P,T,PI,AI,VV,TV) 

To Gef Help Type help network/train 

Descripfion train trains anetwork net according to net.trainFcn and net.trainParam. 


train(NET,P,T,Pi,Ai,VV,TV) takes， 


net - Neural Network. 


P  - Network inputs, 

T ， - Network targets, default = zeros. 

Pi - Initial input delay conditions, default = zeros. 
Ai - Initial layer delay condqitions, default = zeros. 
vv - Structure of validation vectors, default = 中 . 
TV - Structure of test vectors, default = 由. 


and returns， 


net - New network. 


TR - Training record (epoch and perf). 
Y “ - Network outputs 

E ， - Network errors. 

Pf - Final input delay conditions， 


Af  - Final layer delay conditions， 


NotethatTis optional andneed only be used for networks thatrequire targets. 
Pi and Pf are also optional and need only be used for networks that have input 
or layer delays. 


Optional arguments VV and TV are described below. 


trains Signal arguments can have two formats: cell array or matrix， 
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The cell array format is easiest to describe. It is most convenient for networks 


with multiple inputs and outputs, and allows sequences of inputs to be 


presented: 
P - Ni 
T - Nt 
PiI - Ni 
Al - NJ 
Y - NO 


E - Nt 
Pf - Ni 
Af - NL 
where 
NI = net， 
NJL = net， 
Nt = net.， 
ID =mnet， 
LD = net， 


X TS cell array, each element P{i,ts} is an Ri 
X TS cell array, each element P{i,ts}is anVi 
X ID cell array, each element Pi{fi,kyis an Ri 
XLD cell array, each element Ai{fi,kyis anSi 
X TS cell array, each element Y{fi,ts} is an Ui 
X TS cell array, each element P{i,ts}y is anVi 
X ID cell array, each element Pf{i,k}is an Ri 


XLD cell array, each element Af{i,kl is an Si XQmatrix， 


numInputs 
numLayers 
numTargets 
numInputDelays 
numLayerDelays 


TS =Number oftime steps 
Q = 了 Batch size 


RiI = net 
Si = net 
VI = net 


.Inputs{I}.size 
.ayers{fIi}.size 
targets{I}.Size 


X Q matrix. 
X Q matrix. 
X Q matrix. 
X Q matTrix. 
X Q matrix. 
X Q matrix. 


X Q matrix. 


The columns ofPi, Pf, Ai, and Af are ordered from the oldest delay condition to 
the most recent: 


Pi{fi,k} 
Pf{i,k} 
Ai{fik} 
Af{iI,k} 


= Input i at tme ts=k-ID. 

= Input iattme ts=TS+k-ID. 

= layer output i attime ts=k-LD. 

= layer output iattime ts=TS+k-LD. 


The matrix format can be used 证 only one time step is to be simulated (TS = 1). 
It is convenient for networks with only one input and output, but can be used 
with networks that have more. 
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了 ach matrix argument is found by storing the elements of the corresponding 
cell array argument into a single matrix: 


P - (sum of RiI) XQmatrix 

T - (sum of Vi) XQmatrix 

Pi - (Sum of Ri) X (IDxQ) matTixX. 
Ai - (sum of Si) X(LDx*Q) matTixX. 
Y - (sum of Ui) XQmatrix. 

E - (sum of Vi)XQ matrix 


Pf - (sum of Ri)X(ID*Q) matrix. 
Af - (sum of Si) X(LDxQ) matTixX. 


IVVv and TV are Supplied they should be an empty matrix [] or a structure with 
the following fields: 


VV.P， TV.P -Validation/test inputs. 
VV.T，TV.T -Validation/test targets, default = zeros. 


vVV.PIi，TV.Pi - Validation/test initial input delay conditions, default = 
Zer0S. 


VV.Ai，TV.Ai - Validation/test layer delay conditions, default = zeros. 


The validation vectors are used to stop training early 让 further training on the 
Primary vectors will hurt generajlization to the validation vectors. Test Vector 
performance can be used to measure how well the network generalizes beyond 
primary and validation vectors. IfVV.T,VV,.Pi, orVV,Ai are set to an empty 
matrix or cell array, default values will be used. The same is true for TV .T， 
TV.Pi，TV.Ai. 


Here input P and targets T define a simple fanction which we can plot: 


p=[012345678]; 
t= [00.84 0.91 0.14 -0.77 -0.96 -0.28 0.66 0.99] ; 
plot(pt，0 ) 


Here newff is used to create a two-layer feed-forward network. The network 
will have an input (ranging fom 0 to 8), followed by alayer of 10 tansig 
neurons, followed by alayer with 1 purelin neuron. trainlmbackpropagation 
is used. The network is also simulated. 


net = newff([0 8],[10 1],{f tansig purelin +，trainlm ' ) ; 
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Algorithm 


See Also 


y1 = Sim(net,p) 
plot(p,t， 0 ;py1，X') 


Here the network is trained for up to 50 epochs to a error goal of 0.01, and then 
resimulated. 


net.trainParam.epochs = 
net .trainParam.goal = 0， 
net = train(net,p ,七 ) ; 
y2 = Sim(net,p) 
plot(pt，0' ,py1，x' py2，* ) 


50 ; 
01; 


train calls the function indicated by net.trainFcn, using the training 
parameter values indicated by net.trainParam. 


Typically one epoch of training is defined as a single presentation of all input 
Vectors to the network. The network is then updated according to the results of 
all those presentations. 


Training occurs until a maximum number of epochs occurs, the performance 
goal is met, or any other stopping condition ofthe function net.trainFcn 
Occurs. 


Some training functions depart 位 om this norm by presenting only one input 
Vector (or Sequence) each epoch. An input vector (or sequence) is chosen 
randomjly each epoch 他 om concurrent input vectors (or sequences). newc and 
newsom return networks that use trainr, atraining function that presents 
each input vector once in random order. 


Slimn，init，adapt，revert 
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PurPpose Batch training with weight and bias learning rules. 


Syntax [net,TR,Ac,E1] = trainb(net,Pd,T1,Ai,Q,TS,VV,TV) 


info = trainb(code) 


Description trainb is not called directly. Instead it is called by train for networks whose 
net.trainFcn property is set to trainb'. 


trainb traings a network with weight and bias learning rules with batch 
updates. The weights and biases are updated at the end of an entire pass 
through the input data. 


trainb(net,Pd,T1,Ai,Q,TS,VV) takes these inputs， 


net - Neural network. 

Pd - Delayed inputs. 

T1  - Layer targets. 

Ai - Initial input conditions. 

Q “ - Batch size. 

TS - Time steps. 

vvV ” - Empty matrix [] or structure ofvalidation vectors. 
TV - Empty matrix [] or structure of test vectors. 


and returns， 


net - Trained network. 
TR - Training record of various values over each epoch: 
TR.epoch - 了 Epoch number. 
TR.perf  - Training performance, 
TR.vperf - Validation performance, 
TR.tperf -_ Test performance. 
Ac - Collective layer outputs for last epoch. 
E1  - Layer errors for last epoch. 
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Training occurs according to the trainb's training parameters, shown here 
with their default values: 


net.trainParam.epochs 100 ”Maximum number of epochs to train 

net.trainParam.goal 0 _ Performance goal 

net.trainParam.max_fail 5 Maximum validation failures 

net.trainParam.show 25 了 玫 pochs between displays (NaN for no 
displays) 

net.trainParam.time inf Maximum time to train in seconds 


Dimensionsgs for these variables are: 


Pd - No x Ni xTScellarray,each element Pdfi,j,ts}yisaDij x Qmatrix. 
T1 - N1L x TS cell array, each element P{fi,tsyig8avi x Qmatrix or 昌 . 
Ai - N1 x LDcell array, each element Ai{fi,kyis an Si x Qmatrix. 


NI = net.numInputs 

N1L = net.numLayers 

LD = net.numLayerDeJays 

RiI = net.inputs{i}.sSize 

Si = net.Layers{Ii},.size 

Vi = net.targets{fi}.sSize 

Dij = Ri * Jength(net.inputWeights{i,j}.delays) 
IfVvVvor TV is not [, tmust be a Structure of vectors: 

VV.PD，TV.PD - Validation/test delayed inputs. 

VV.T1，TV.T1 - Validation/test layer targets, 

VV.Ai，TV.Ai - Validation/test initial input conditions. 

vvV.Q，TV.Q - Validation/test batch size. 

VV.TS，TV.TS - Validation/test time steps. 


Validation vectors are Used to stop training early 计 the network performance 
on the validation vectors fails to imnprove or remains the same for max_fail 
epochs in a row. Test vectors are used as a further check that the network is 
generalizing well, but do not have any effect on training. 
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Algorithm 


See Also 
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trainb(CODE) returns useful information for each CODE string: 


"pnames - Names oftraining parameters. 
'pdefaults' - Default training parameters， 


You can create a standard network that uses trainb by calling new1in. 
To prepare a custom network to be trained with trainb: 


1 Set net.trainFcn to trainb'. 


(This will set NET.trainParam to trainb's default parameters.) 


2 Seteach NET.inputWeights{fi,j}.learnFcnto alearning function. 
3 Seteach NET.layerWeights{fi,j}.learnFcnto alearning function. 


4 Seteach NET.biases{fi}.1learnFcntoalearning function. (Weight and bias 
learning parameters will automatically be set to default values for the given 
learning fanction.) 


To train the network: 


1 Set NET.trainParam properties to desired values， 
2 Set weight and bias learning parameters to desired values， 
3 Call train. 


See newlin for training examples 

卫 ach weight and bias updates according to its learning function after each 
epoch (one pass through the entire set of input Vectors). 

Training stops when any ofthese conditiongs are met: 


e The maximum number of epochs (repetitions) is reached. 
e Performance has been minimized to the goal. 
e The maximum amount of time has been exceeded. 


e Validation performance has increase more than max_fail times since the 
last time it decreased (when using validation). 


newp，newlin，train 
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Purpose 


Synftax 


Descripfion 


BEFGS quasi-Newton backpropagation 


[net,TR,Ac,E1L] = trainbfg(net,Pd,T1,Ai,Q,TS,VV,TV) 

info = trainbfg(code) 

trainbfg is a network training function that updates weight and bias values 
according to the BFGS quasi-Newton method. 
trainbfg(net,Pd,T1,Ai,Q,TS,VV,TV) takes these inputs， 


net - Neural network. 


Pd - Delayed input vectors. 

T1  - Layer target vectors. 

Ai - Initial input delay conditions. 

Q “ - Batch size. 

TS - Time steps. 

VV ” - Either empty matrix [] or structure ofvalidation vectors， 
TV - Either empty matrix [] or Structure of test vectors， 


and returns， 


net - Trained network. 
TR - Training record of various values over each epoch: 
TR.epoch - 了 Epoch number. 
TR.perf -Training performance， 
TR.vperf - Validation performance. 
TR.tperf -_ Test performance. 
Ac - Collective layer outputs for last epoch. 
E1  - Layer errors for last epoch. 
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Training occurs according to trainbfg's training parameters, shown here with 
their default values: 


net.trainParam.epochs 100 ” Maximum number of epochs to train 
net.trainParam.show 25 “了 卫 pochs between showing progressg 
net.trainParam.goal 0 “Performance goal 
net.trainParam.time inf Maximum time to train in seconds 


net.trainParam.min grad 1e-6 Minimum performance gradient 


net.trainParam.max_fail 5 Maximum validation failures 
net.trainParam.searchFcn Name of line search routine to use. 
"Srchcha.， 


Parameters related to line search methods (not all used for all methods): 


net.trainParam.Scal_ tol 20 
Divide into delta to determine tolerance for linear Search. 


net ,trainParam.alpha 0.001 
Scale factor, which determines sufficient reduction in perf. 


net.trainParam.beta 0.1 
Scale factor, which determines sufficiently large step size. 


net .trainParam.delta 0.01 
Initial step Size in interval location step. 


net.trainParam.gama 0.1 


Parameter to avoid small reductions in performance. Usually set to 0.1. 
(See use in srch_cha.) 


14-264 


trainbfg 





net.trainParam.1Low 1im 0.1 Lower limit on change in step Size 
net.trainParam.up_1Lim 0.5 Upper limit on change in step size. 
net.trainParam.maxstep 100 Maximum step length. 
net.trainParam.minstep 1.0e-6 Minimum step length. 
net.trainParam.bmax 26 Maximum step size. 


Dimensionsgs for these variables are: 


Pd - NoXNiIiXTScell array, each element P{i,j,ts}yis aDijxQmatrix. 
T1 - NLXTS cell array, each element P{i,tsl is aVixQmatrix. 
Ai - NILXLDcell array, each element Aifi,kyis an SixQmatrix. 


NI = net.numInputs 

N1L = net.numLayers 

LD = net.numLayerDelays 

Ri = net.inputs{i}.sSize 

Si = net.1Layers{fi}.Size 

Vi = net.targets{fi}.sSize 

Dij = Ri * length(net.inputWeights{i,j}.delays) 
IfVVvis not [],it must be a structure of validation vectors， 


vvV.PD - Validation delayed inputs. 
VV.T1L - Validation layer targets. 
vvV.Ai - Validation initial input conditions, 
vv.Q - Validation batch size. 
VV.TS - Validation time steps. 
which is used to stop training early 这 the network performance on the 


validation vectors fails to imnprove or remains the same for max_fail epochs in 
a Tow， 
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IfTVis not [],it mustbe astructure of validqation vectors， 


TV.PD - Validation delayed inputs. 
TV.T1L - Validation layer targets. 
TV.Ai - Validation initial input conditions. 
TV.Q - Validation batch size, 
TV.TS - Validation time steps. 
which is used to test the generalization capability ofthe trained network. 


trainbfg(code) returns useful information for each code string: 


"pnames' - Names of training parameters. 
"pdefaults' - Default training parameters， 


Here is a problem consisting ofinputs P and targets Tthat we would like to 
solve with a network. 


P=[012345]; 
T=[000111]; 


Here a two-layer feed-forward network is created. The network's input ranges 
位 om [0 to 10]. The first layer has two tansigmneurons, and the second layer 
has one 1ogsig neuron. The trainbfg network training function is to be used. 


Create and Test a Network 
net = newff([0 5],[2 1],{tansig ，1ogsig +，trainbfg ) ; 
a= Sim(net,p) 


Train and Retest the Network 
net.trainParam.epochs = 50 
net.trainParam.Sshow = 10; 
net.trainParam.goal = 0.1; 
net = train(net,p,t) ; 
a= Sim(net,p) 


See newff，newcf, and newelm for other examples 


You can create a standard network that uses trainbfg with newff，newcf, or 
newelnm. 
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Algorithm 


See Also 


To prepare a custom network to be trained with trainbfg: 


1 Setnet.trainFcnto'trainbfg.This will set net.trainParamto trainbfg?s 
default parameters. 


2 Setnet.trainParam properties to desired values. 


In either case, calling train with the resulting network will train the network 
with trainbfg. 


trainbfg can train any network as long as its weight, net input, and transfer 
functions have derivative functions. 


Backpropagation is used to calculate derivatives of performance perf with 
respect to the weight and bias variables X. 本 ach variable is adqjusted according 
to the following: 


X=X+ axrdx; 


Where dX is the search direction. The parameter a is selected to minimize the 
performance along the search direction. The line search function searchFcn is 
used to locate the minimum point. The first search direction is the negative of 
the gradient of performance. In succeeding iterations the search direction is 
computed according to the following formula: 


dx = -H\gXi 


where gXis the gradient and His an approximate Hessian matrix.See page 119 
of Gill, Murray, and Wright (see reference below)for amore detailed discussion 
of the BFGS quasi-Newton method. 


Training stops when any ofthese conditions occurT: 


e The maximum number of epochs (repetitions) is reached. 
e The maximum amount oftime has been exceeded. 

se Performance has been minimized to the goal1. 

e The performance gradient falls below mingrad. 


e Validation performance has increased more than max_fail times since the 
last time it decreased (when using validation). 


newff ，newcf，traingdm，traingda，traingdx，trainlm，trainrp， 
traincgf，traincgb，trainscg，traincgp，trainoss ， 
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Purpose 


Synftax 


Descripfion 


Bayesian regularization backpropagation 


[net,TR,Ac,E1L] = trainbr(net,Pd,T1,AliQ,TS,VV ,TV) 


info = trainbr(code) 


trainbr is anetwork training function thatupdates the weight and bias values 
according to Levenberg-Marquardt optimization. Itminimizes a combination of 
squared erTrors and weights, and then determines the correct combination So as 
to produce a network that generalizes well. The procegss is called Bayesian 
regularization. 


trainbr(net,Pd,T1,Ai,Q,TS,VV,TV) takes these inputs， 


net - Neural network. 
Pd - Delayed input vectors. 


T1  - Layer target vectors. 

Ai - Initial input delay conditions. 

Q “ - Batch size. 

TS - Time steps. 

VV ” - Either empty matrix [] or structure ofvalidation vectors, 
TV - Either empty matrix [] or Structure of test vectors， 


and returns， 


net - Trained network. 
TR - Training record of various values over each epoch: 
TR.epoch - 了 Epoch number. 
TR.perf  - Training performance. 
TR.vperf - Validation performance. 
TR.tperf - _ Test performance. 
TR .mu - Adaptive mu value. 
Ac  - Collective layer outputs for last epoch. 
E1  - Layer errors for last epoch. 
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Training occurs according to the trainlms training parameters, Shown here 
with their default values: 


net.trainParam.epochs 100 Maximum number of epochs to train 
net.trainParam.goal 0 _ Performance goal 
net.trainParam.mu 0.005 Marquardt adjustment parameter 
net.trainParam.mu_ dec 0.1 Decrease factor for mu 
net.trainParam.mu_inc 10 Increase factor for mu 


net.trainParam.mu_max 1e-10 Maximum value for mu 
net.trainParam.max fail 5 Maximum validation failures 
net .trainParam.mem_reduc 1 


Factor to use for memory/speed trade-o 任 


net.trainParam.min_ grad 1e-10 Minimum performance gradient 
net.trainParam.show 25 了 Epochs between showing progress 


net .trainParam.time inf Maximum time to train in seconds 
Dimensionsgs for these variables are: 


Pd - NoXNiXTScellarray, each element P{i,j,ts}isaDijxQmatrix. 
T1 - NLXTScell array, each element P{i,tsl is aVixXxQmatrix. 
Ai - NILXLDcell array, each element Aifi,kyis an SixQmatrix. 


where 


NI = net.numInputs 

N1L = net.numLayers 

LD = net.numLayerDelays 

Ri = net.inputs{i}.sSize 

Si = net.1Layers{fi}.Size 

Vi = net.targets{i}.size 

Dij = Ri * length(net.inputWeights{i,j}.delays) 
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Examples 


IfVvVvis not [],it must be a structure of validation vectorSs， 


vVV.PD - Validation delayed inputs. 
VV.T1L - Validation layer targets. 
vvV.Ai - Validation initial input conditions, 
vv.Q - Validation batch size. 
VV.TS - Validation time steps. 
which is normajlly used to stop training early ifthenetwork performance on the 


validation vectors fails to imnprove or remaings the same for max_fail epochs in 
a TowW， 


IfTVis not [it must be a structure of validation vectors， 


TV.PD - Validation delayed inputs. 
TV.T1L - Validation layer targets. 
TV.Ai - Validation initial input conditions. 
TV.Q - Validation batch size. 
TV.TS - Validation time steps, 
which is used to test the generalization capability of the trained network. 


trainbr(code) returns useful information for each code string: 


"pnames' - Names oftraining parameters. 
"pdefaults' - Default training parameters. 


Here is a problem consisting ofinputs p and targetst that we would like to 
Solve with a network. It involves fitting a noisy Sine Wave. 

p=[-1:.05:1]; 

t = Sin(2*pix*p)+0.1x*randn(Size(p)); 
Here a two-layer feed-forward network is created. The network's input ranges 
位 om [-1 to H. The first layer has 20 tansig neurons, the second layer has one 
purelin neuron.The trainbr network training function is to be used. The plot 


of the resulting network output should show a smooth response, without 
overfitting. 


Create da Network 
net=newff([-1 1],[20,1],{ tansig ，purelin' }，trainbr ) ; 
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Algorithm 
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Train and Test the Network 
net.trainParam.epochs = 50 
net.trainParam.Show = 10; 
net = train(net,p,t) ; 
a= Sim(net,p) 
plot(p,a,p,t， 十 ) 


You can create a standard network that uses trainbr with newff，newcf, or 
newelnm. 


To prepare a custom network to be trained with trainbr: 


1 Set net.trainFcn to trainlm. This will set net.trainParam to trainbrs 
default parameters, 


2 Setnet.trainParam properties to desired values. 


In either case, calling train with the resulting network will train the network 
with trainbr. 


See newff，newcf, and newelm for examples. 


trainbr can train any network as long as its weight, net input, and transfer 
functions have derivative fanctions， 


Bayesian regularization minimizes a linear combination ofsquared errors and 
weights. It also modifies the linear combination so that at the end of training 
the resulting network has good generalization qualities. See MacKay (Wexra/ 
Co7mzpxtatioz) and Foresee and Hagan CProceeazimngs oftPe 711aterpatiomal -Joz7t 
Comjeremce o1 Veral NetuorRls) for more detailed discussions of Bayesian 
regularization. 


This Bayesian regularization takes place within the Levenberg-Marquardt 
algorithm. Backpropagation is used to calculate the Jacobian jX of 
performance perf with respect to the weight and bias variables X. Each 
variable is adqjusted according to Levenberg-Marquardt， 


jj = jX * jX 
je = jx *E 
dxX = -(jj+I*mu) \ je 


where E is all errors and I is the identity matrix. 
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See Also 


References 


The adaptive value mu is increased by mu_inc until the change shown above 
results in a reduced performance value. The change is then made to the 
network and mu is decreased by mu_dec. 


The parameter mem_reduc indicates how to use memory and speed to calculate 
the Jacobian jXx. Ifmem_reduc is 1, then trainlmruns the fastest, but can 
require a lot of memory. Increasing mem_reduc to 2 cuts some of the memory 
required by a factor oftwo, but slows trainlm somewhat. Higher values 
continue to decrease the amount of memory needed and increase the training 
times. 


Training stops when any one ofthese conditions occurs: 


e The maximum number of epochs (repetitions) is reached. 
e The maximum amount of time has been exceeded. 

e Performance has been minimized to the goal1. 

e The performance gradient falls below mingrad. 

e mu exceeds mu_max. 


e Validation performance has increased more than max_fail times since the 
last time it decreased (when using validation). 


newff ，newcf，traingdm，traingda，traingdx，trainlm，trainrp， 
traincgf，traincgb，trainscg，traincgp，trainoss 


Foresee, 上 上 . D., and M.T. Hagan “Gauss-Newton approximation to Bayesian 
regularization,”ProceeaQings oftPe 1997 11termatiomal .Jozzt Comjere7zce o7 
Veral Netzorps, 1997. 


MacKay, D. J. C., “Bayesian interpolation,”Nexral Co7azpztatiom, vol. 4, no. 3， 
pp. 415-447,，1992. 
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Cyclical order incremental training with learning functions 


[net,TR,Ac,E1] = trainc(net,Pd,T1,Ai,Q,TS,VV TV) 


info = trainc(code) 


trainc is not called directly. Instead it is called by train for networks whose 
net.trainFcn property ls set to 'trainc '. 


trainc trains anetwork with weight and bias learning rules with incremental 
updates after each presentation of an input. Inputs are presented in cyclic 
order. 


trainc(net,Pd,T1,Ai,Q,TS,VV,TV) takes these inputs， 


net - Neural network. 

Pd  - Delayed inputs. 

T1  - Layer targets. 

Ai - Initial input conditions. 
Q “ - Batch size. 

TS - Time steps. 

VV “ - Ignored. 

TV - Ignored. 


and returns， 


net - Trained network. 

TR - Training record of various values over each epoch: 
TR.epoch - 了 Epoch number. 
TR.perf  - Training performance， 

Ac - Collective layer outputs. 

E1  - Layer errors. 
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Network Use 


Training occurs accordingtothetrainc'straining parameters shown here with 
their default values: 


net.trainParam.epochs 100 Maximum number ofepochs to train 


net.trainParam.goal 0 “Performance goal 

net .trainParam.show 25 ”了 Epochs between displays (NaN for no 
displays) 

net.trainParam.time inf Maximum time to train in seconds 


Dimensionsgs for these variables are: 


Pd - No x Ni x TScellarray,eachelement Pd{fi,j,ts}yisaDij x Qmatrix. 
T1 - N1L x TS cell array, each element P{fi,tsyisavi x Qmatrix or 站 . 
Ai - N1 x LDcell array, each element Ai{i,klyis anSi x Qmatrix. 


NI = net.numInputs 

N1L = net.numLayers 

LD = net.numLayerDelays 

RiI = net.inputs{Ii}.sSize 

Si = net.1ayers{i}.Size 

Vi = net.targets{fi}.sSize 

Dij = Ri * Jength(net.inputWeights{i,j}.delays) 
trainc does not imnplement validation or test vectors, So arguments VV and TV 
are ignored. 
trainc(code) returns useful information for each code string: 


'pnames - Names oftraining parameters. 
'pdefaults' - Default training parameters， 


You can create a standard network that uses trainc by calling newp. 


To prepare a custom network to be trained with trainc: 
1 Set net.trainFcn to 'trainc'. 
(IThis will set net.trainParam to trainc default parameters.) 
2 Set each net.inputWeights{fi,j}.lLlearnFcnto alearning function. 
3 Set each net.layerWeights{fi,j}.learnFcnto alearning function. 


14-275 


frcinc 





Algorithm 


See Also 


14-276 


4 Seteach net.biases{fi}.1learnFcntoalearning function. (Weight and bias 
learning parameters will automatically be set to default values for the given 
learning fanction.) 


To train the network: 


1 Set net.trainParam properties to desired values， 
2 Set weight and bias learning parameters to desired values, 
3 Call train. 


See newp for training examples. 
For each epoch, each vector (or sequence) is presented in order to the network 


with the weight and bias values updated accordingly after each individual 
presentation. 


Training stops when any ofthese conditiongs are met: 


e The maximum number of epochs (repetitions) is reached. 
se Performance has been minimized to the goal. 


e The maximum amount of time has been exceeded. 


newp，newlin，train 
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Synftax 


Descripfion 


Conjugate gradient backpropagation with Powell-Beale restarts 


[net,TR,Ac,E1] = traincgb(net,Pd,T1,Ali,Q,TS,VV,TV) 
info = traincgb(code) 
traincgb is a network training function that updates weight and bias values 


according to the conjugate gradient backpropagation with Powell-Beale 
Testarts. 


traincgb(net,Pd,T1,Ai,Q,TS,VV,TV) takes these inputs， 
net - Neural network. 


Pd - Delayed input vectors. 
T1  - Layer target vectors. 


姜 
户 . 


Initial input delay conditions. 
Batch size. 


TS - Time steps. 


只 


VV ” - Either empty matrix [] or structure ofvalidation vectors, 


| 
一 


卫 ither empty matrix [ ] or structure of test Vectors. 


TV -了 ither empty matrix [] or structure of test vectors. 


and returns， 


net - Trained network. 
TR - Training record of various values over each epoch: 
TR.epoch - 了 Epoch number. 
TR.perf  - Training performance. 
TR.vperf - Validation performance. 
TR.tperf -_ Test performance. 
Ac - Collective layer outputs for last epoch. 
E1  - Layer errors for last epoch. 


14-277 


traincgb 





Training occurs according to the traincgb's training parameters, shown here 
with their default values: 


net.trainParam.epochs 100 ” Maximum number of epochs to train 
net.trainParam.show 25 “了 卫 pochs between showing progressg 
net.trainParam.goal 0 “Performance goal 
net.trainParam.time inf Maximum time to train in seconds 


net.trainParam.min grad 1e-6 Minimum performance gradient 


net.trainParam.max_fail 5 Maximum validation failures 
net.trainParam.searchFcn Name of line search routine to use. 
"Srchcha.， 


Parameters related to line search methods (not all used for all methods): 


net.trainParam.Scal_ tol 20 
Divide into delta to determine tolerance for linear Search. 


net.trainParam.alpha 0.001 
Scale factor, which determines sufficient reduction in perf. 


net.trainParam.beta 0.1 
Scale factor, which determines sufficiently large step size. 


net .trainParam.delta 0.01 
Initial step size in interval location step, 


net.trainParam.gama 0.1 


Parameter to avoid small reductions in performance. Usually set to 0.1. 
(See use in srch_cha.) 


net.trainParam.1Low 1im 0.1 Lower limit on change in step size. 
net.trainParam.up_1Lim 0.5 Upper limit on change in step size. 
net.trainParam.maxstep 100 Maximum step length. 
net.trainParam.minstep 1.0e-6 Minimum step length. 
net.trainParam.bmax 26 Maximum step size. 
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Dimensionsgs for these variables are: 


Pd - NoXNiXTScellarray, each element P{i,j,ts}yisaDijxQmatrix， 
T1 - NLXTS cell array, each element P{i,tsl is aVixQmatrix. 
Ai - NILXLDcell array, each element Aifi,kyis an SixQmatrix. 


NI = net.numInputs 

N1L = net.numLayers 

LD = net.numLayerDelays 

RiI = net.inputs{Ii}.sSize 

Si = net.1Layers{fi}.Size 

Vi = net.targets{fi}.sSize 

Dij = Ri * length(net.inputWeights{i,j}.delays) 
IfVVvis not [],it must be a structure of validation vectors， 


VV.PD - Validation delayed inputs. 
VV.T1L - Validation layer targets. 
vvV.Ai - Validation initial input conditions. 
vv.Q - Validation batch size. 
VV.TS - Validation time steps. 
which is used to stop training early 这 the network performance on the 


validation vectors fails to imnprove or remaings the same for max_fail epochs in 
a Tow， 


IfTVis not [],itmust be a structure ofvalidation vectors， 
TV.PD - Validation delayed inputs. 
TV.T1L - Validation layer targets. 
TV.Ai - Validation initial input conditions, 
TV.Q - Validation batch size. 
TV.TS - Validation time steps. 


which is used to test the generalization capability ofthe trained network. 
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Examples 


Neftwork Use 
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traincgb(code) returns useful information for each code string: 


"pnames' - Names oftraining parameters. 
"pdefaults' - Default training parameters， 


Here is a problem consisting of inputs p and targets t that we would like to 
Solve with a network. 


p= [012345]; 
t= [000111]; 


Here a two-layer feed-forward network is created. The network's input ranges 
位 om [0 to 10]. The first layer has two tansigneurons, and the second layer 
has one 1ogsig neuron. The traincgb network training function is to be used. 


Create and Test a Network 
net = newff([0 5],[2 1],{ tansig ，1ogsig +，traincgb  ) ; 
a= Sim(net,p) 


Train and Retest the Network 


net.trainParam.epochs = 50 
net.trainParam.Show = 10; 
net.trainParam.goal = 0.1; 
net = train(net,p,t) ; 

a= Sim(net,p) 


See newff，newcf, and newelm for other examples. 

You can create a standard network that uses traincgb with newff，newcf, or 
newelnm. 

To prepare a custom network to be trained with traincgb: 


1 Setnet.trainFcnto'traincgb'.This will set net.trainParamto traincgb”s 
default parameters, 


2 Setnet.trainParam properties to desired values. 


In either case, calling train with the resulting network will train the network 
with traincgb. 


traincgb 





Algorithm 


See Also 


References 


traincgb can train any network as long as its weight, net input, and transfer 
functions have derivative fanctions. 


Backpropagation is used to calculate derivatives of performance perf with 
respect to the weight and bias variables X. 了 ach variable is adqjusted according 
to the following: 


X=X+ axrdx; 


Where dX is the search direction. The parameter a is selected to minimize the 
performance along the search direction. The line search function searchFcn is 
used to locate the minimum point. The first search direction is the negative of 
the gradient of performance. In succeeding iterations the search direction is 
computed 位 om the new gradient and the previous search direction according 
to the formula: 


dxX = -gxX + dX 0O1LdxZi 


Where gX is the gradient. The parameter Z can be computed in several different 
ways. The Powell-Beale variation ofconjugate gradient is distinguished bytwo 
features. First, the algorithm uses a test to determine when to reset the search 
direction to the negative ofthe gradient. Second, the search direction is 
computed 位 om the negative gradient, the previous search direction, and the 
last search direction before the previous reset. See Powell, MMatPemzatzcal 
Prosra1z712771S, for a more detailed qiscussion of the algorithm. 


Training stops when any ofthese conditions occurT: 


e The maximum number of epochs (repetitions) is reached. 
e The maximum amount of time has been exceeded. 

se Performance has been minimized to the goal1. 

e The performance gradient falls below mingrad. 


e Validation performance has increased more than max_fail times since the 
last time it decreased (when using validation). 


newff ，newcf，traingdm，traingda，traingdx，trainlm，traincgp， 
traincgf，traincgb，trainscg，trainoss，trainbfg 


Powell, M. J. D. ”Restart procedures for the conjugate gradient method,” 
Matpematical Prosramz7127712S, Vol. 12, pp. 241-254, 1977. 


14-281 


fraincgf 





PurPpose Conjugate gradient backpropagation with Fletcher-Reeves updates 


Syntax [net,TR,Ac,E1L] = traincgf(net,Pd,T1,Ai,Q,TS,VV,TV) 


info = traincgf(code) 


Description traincgf is a network training function that updates weight and bias values 
according to the conjugate gradient backpropagation with Fletcher-Reeves 
Updates. 


traincgf(NET,Pd,T1,Ai,Q,TS,VV,TV) takes these inputs， 


net - Neural network. 

Pd  - Delayed input vectors. 

T1  - Layer target vectors. 

Ai - Initial input delay conditions. 

Q “ - Batch size. 

TS - Time steps. 

vv - Either empty matrix [] or structure ofvalidation Vectors. 
TV - Either empty matrix [] or Structure of test vectors. 


and returns， 


net - Trained network. 
TR - Training record of various values over each epoch: 
TR.epoch - 了 Epoch number. 
TR.perf  - Training performance， 
TR.vperf - Validation performance， 
TR.tperf -_ Test performance. 
Ac - Collective layer outputs for last epoch. 
E1  - Layer errors for last epoch. 
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Training occurs according to the traincgf”s training parameters, shown here 
with their default values: 


net.trainParam.epochs 100 ”Maximum number of epochs to train 
net.trainParam.show 25 ”Epochs between showing progress 
net.trainParam.goal 0 Performance goal 
net.trainParam.time inf Maximum time to train in seconds 


net.trainParam.min grad 1e-6 Minimum performance gradient 


net.trainParam.max_fail 5 Maximum validation failures 
net .trainParam.searchFcn Name of line search routine to use 
'Srchcha， 


Parameters related to line search methods (not all used for all methods): 
net.trainParam.Scal_tol 20 
Divide into delta to determine tolerance for linear search. 
net.trainParam.alpha 0.001 


Scale factor, which determines sufficient reduction in perf. 


net.trainParam.beta 0.1 
Scale factor, which determines sufficiently large step Size. 


net .trainParam.delta 0.01 
Initial step Size in interval location step. 


net.trainParam.gama 0.1 


Parameter to avoid small reductions in performance. Usually set to 0.1. 
(See use in srch_cha.) 


net.trainParam.1Low 1im 0.1 Lower limit on change in step size, 
net.trainParam.up_1Lim 0.5 Upper limit on change in step size. 
net.trainParam.maxstep 100 Maximum step length. 
net.trainParam.minstep 1.0e-6 Minimum step length. 
net.trainParam.bmax 26 Maximum step size. 
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Dimensions for these variables are: 


Vi = 
Dij = 


NoXNiXTScell array, each element P{i,j,ts}yisaDijxQmatrix. 


N1LXTS cell array, each element P{i,tsl is aViXxQmatrix， 


N1XLD cell array, each element Ai{fi,kyis an SixXxQmatrix. 


-numInputs 
.numLayers 
.numLayerDelays 
.Inputs{Ii}y.Size 
.ayers{i}.size 
.targetSs{i}.Size 


1i * length(net.inputWeights{1I,j}.delays) 


IfVvVvis not [],it must be a structure of validation vectors， 


VV.PD 
VV.TJ 
VV.Ai 
VV.Q 

VV.TS 


Validation delayed inputs. 
Validation layer targets. 
Validation initial input conditions. 
Validation batch size. 

Validation time steps. 


which is used to stop training early 让 the network performance on the 
validation vectors fails to imnprove or remaings the same for max_fail epochs in 


a ITOW. 


IfTVis not [],itmustbe astructure of validqation vectors， 


TV.PD 
TV.TJ 
TV.Ai 
TV.Q 

TV.TS 


Validation delayed inputs. 
Validation layer targets. 
Validation initial input conditions. 
Validation batch size. 

Validation time steps. 


which is used to test the generalization capability ofthe trained network. 
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Examples 


Network Use 


traincgf(code) returns useful information for each code string: 


"pnames' - Names of training parameters. 
"pdefaults' - Default training parameters. 


Here is a problem consisting of inputs p and targets t that we would like to 
Solve with a network. 


p=[012345] 
t= [000111]; 


Here a two-layer feed-forward network is created. The network's input ranges 
位 om [0 to 10]. The first layer has two tansigmneurons, and the second layer 
has one 1ogsig neuron. The traincgf network training function is to be used. 


Create and Test a Network 
net = newff([0 5],[2 1],{tansig ，1ogsig +，traincgf  ) ; 
a= Simn(net,p) 


Train and Retest the Network 


net.trainParam.epochs = 50; 
net .trainParam.Sshow = 10; 
net.trainParam.goal = 0.1; 
net = train(net,p ,七 ) ; 

a= Simn(net,p) 


See newff，newcf, and newelm for other examples. 


You can create a standard network that uses traincgf with newff，newcf, or 
newelnm. 


To prepare a custom network to be trained with traincgf: 


1 Setnet.trainFcnto'traincgf'.This will set net.trainParamto traincgf”S 
default parameters, 


2 Setnet.trainParam properties to desired values. 


In either case, calling train with the resulting network will train the network 
with traincgf. 
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Algorithm 
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traincgf can train any network as long as its weight, net input, and transfer 
functions have derivative fanctions. 


Backpropagation is used to calculate derivatives of performance perf with 
respect to the weight and bias variables X. 了 ach variable is aqjusted according 
to the following: 


X=X+axrdx; 


where dX is the search direction. The parameter a is selected to minimize the 
performance along the search direction. The line search function searchFcn is 
used to locate the minimum point. The first search direction is the negative of 
the gradient of performance. In succeeding iterations the search direction is 
computed 位 om the new gradient and the previous search direction，according 
to the formula: 


dX = -gxX + dX Oldx*Zi 


where gXis the gradient. The parameter Z can be computed in several different 
ways. For the Fletcher-Reeves variation of conjugate gradient it is computed 
according to 


Z=normnew_ sqr/norm_sqr; 


Where norm_sqr is the norm square ofthe previous gradient and normnew_sqr 
is the norm square ofthe current gradient. See page 78 of Scales (ztroazxc 碟 om 
io Vonm-ZLziPnear Opt7atzzatioz) for a more detailed discussion of the algorithm. 


Training stops when any of these conditions occurT: 


e The maximum number of epochs (repetitions) is reached. 
e The maximum amount of time has been exceeded. 

se Performance has been minimized to the goal. 

e The performance gradient falls below mingrad. 


e。 Validation performance has increased more than max_fail times since the 
last time it decreased (when using validation). 
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See Also newff ，newcf，traingdm，traingda，traingdx，trainlm，traincgp， 
traincgb，trainscg，traincgp，trainoss，trainbfg 


References Scales, 工 . 卫 .,，7atroqQzxctio7 to Nom-Linear Optiazza 如 o1, New York: 
Springer-Verlag, 1985. 
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PurPpose Conjugate gradient backpropagation with Polak-Ribiere updates 


Syntax [net,TR,Ac,E1L] = traincgp(net,Pd,T1,Ai,Q,TS,VV,TV) 


info = traincgp(code) 


Description traincgp is a network training function that updates weight and bias values 
according to the conjugate gradient backpropagation with Polak-Ribiere 
Updates. 


traincgp(net,Pd,T1,Ai,Q,TS,VV,TV) takes these inputs， 


net - Neural network. 

Pd  - Delayed input vectors. 

T1  - Layer target vectors. 

Ai - Initial input delay conditions. 

Q “ - Batch size. 

TS - Time steps. 

vv - Either empty matrix [] or structure ofvalidation Vectors. 
TV - Either empty matrix [] or Structure of test vectors. 


and returns， 


net - Trained network. 
TR - Training record of various values over each epoch: 
TR.epoch - 了 Epoch number. 
TR.perf  - Training performance， 
TR.vperf - Validation performance， 
TR.tperf -_ Test performance. 
Ac - Collective layer outputs for last epoch. 
E1  - Layer errors for last epoch. 
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Training occurs according to the traincgp's training parameters shown here 
with their default values: 


net.trainParam.epochs 100 ”Maximum number of epochs to train 
net.trainParam.show 25 ”Epochs between showing progress 
net.trainParam.goal 0 Performance goal 
net.trainParam.time inf Maximum time to train in seconds 


net.trainParam.min grad 1e-6 Minimum performance gradient 


net.trainParam.max_fail 5 Maximum validation failures 
net .trainParam.searchFcn Name of line search routine to use 
'Srchcha， 


Parameters related to line search methods (not all used for all methods): 
net.trainParam.Scal_tol 20 
Divide into delta to determine tolerance for linear search. 
net.trainParam.alpha 0.001 


Scale factor which determines sufficient reduction in perf. 


net.trainParam.beta 0.1 
Scale factor which determines sufficiently large step Size. 


net.trainParam.delta 0.01 
Initial step size in interval location step. 


net.trainParam.gama 0.1 


Parameter to avoid small reductions in performance. Usually set to 0.1. 
(See use in srch_cha.) 


net.trainParam.1Low 1im 0.1 Lower limit on change in step Size, 
net.trainParam.up_1Lim 0.5 Upper limit on change in step size. 
net.trainParam.maxstep 100 Maximum step length. 
net.trainParam.minstep 1.0e-6 Minimum step length. 
net.trainParam.bmax 26 Maximum step size. 
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Dimensions for these variables are: 


Vi = 
Dij = 


NoXNiXTScell array, each element P{i,j,ts}yisaDijxQmatrix. 


N1LXTS cell array, each element P{i,tsl is aViXxQmatrix， 


N1XLD cell array, each element Ai{fi,kyis an SixXxQmatrix. 


-numInputs 
.numLayers 
.numLayerDelays 
.Inputs{Ii}y.Size 
.ayers{i}.size 
.targetSs{i}.Size 


1i * length(net.inputWeights{1I,j}.delays) 


IfVvVvis not [],it must be a structure of validation vectors， 


VV.PD 
VV.TJ 
VV.Ai 
VV.Q 

VV.TS 


Validation delayed inputs. 
Validation layer targets. 
Validation initial input conditions. 
Validation batch size. 

Validation time steps. 


which is used to stop training early 让 the network performance on the 
validation vectors fails to imnprove or remaings the same for max_fail epochs in 


a ITOW. 


IfTVis not [],itmustbe astructure of validqation vectors， 


TV.PD 
TV.TJ 
TV.Ai 
TV.Q 

TV.TS 


Validation delayed inputs. 
Validation layer targets. 
Validation initial input conditions. 
Validation batch size. 

Validation time steps. 


which is used to test the generalization capability ofthe trained network. 
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Examples 


Network Use 


traincgp(code) returns useful information for each code string: 


"pnames' - Names oftraining parameters. 
"pdefaults' - Default training parameters. 


Here is a problem consisting of inputs p and targets t that we would like to 
Solve with a network. 


p=[012345] 
t= [000111]; 


Here a two-layer feed-forward network is created. The network's input ranges 
位 om [0 to 10]. The first layer has two tansigmneurons, and the second layer 
has one 1ogsig neuron. The traincgp network training function is to be used. 


Create and Test a Network 
net = newff([0 5],[2 1],{tansig ，1ogsig +，traincgp ) ; 
a= Simn(net,p) 


Train and Retest the Network 


net.trainParam.epochs = 50; 
net .trainParam.Sshow = 10; 
net.trainParam.goal = 0.1; 
net = train(net,p ,七 ) ; 

a= Simn(net,p) 


See newff，newcf, and newelm for other examples. 


You can create a standard network that uses traincgp with newff，newcf, or 
neweJnm. 


To prepare a custom network to be trained with traincgp: 


1 Setnet.trainFcnto'traincgp'.This will set net.trainParamto traincgp?s 
default parameters， 


2 Setnet.trainParam properties to desired values. 


In either case, calling train with the resulting network will train the network 
with traincgp. 
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traincgp can train any network as long as its weight, net input, and transfer 
functions have derivative fuanctions. 


Backpropagation is used to calculate derivatives of performance perf with 
respect to the weight and bias variables X. 了 ach variable is aqjusted according 
to the following: 


X=X+axrdx; 


where dX is the search direction. The parameter a is selected to minimize the 
performance along the search direction. The line search function searchFcn is 
used to locate the minimum point. The first search direction is the negative of 
the gradient of performance. In succeeding iterations the search direction is 
computed 位 om the new gradient and the previous search direction according 
to the formula: 


dX = -gxX + dX 0OLdx*Zi 


where gXis the gradient. The parameter Z can be computed in several different 
ways. For the Polak-Ribiere variation of conjugate gradient it is computed 
according to: 


Z= ((gX - gxXold)'*gx)/Vnorm sdqr; 


where norm_sqr is the norm square ofthe previous gradient and gX_o1d is the 
gradient on the previous iteration. See page 78 of Scalegs (Ptroazctio7 如 
op-Lipear Opt712zzatio7) for a more detailed discussion of the algorithm. 


Training stops when any of these conditions occurT: 


e The maximum number of epochs (repetitions) is reached. 
e The maximum amount of time has been exceeded. 

se Performance has been minimized to the goal. 

e The performance gradient falls below mingrad. 


e。 Validation performance has increased more than max_fail times since the 
last time it decreased (when using validation). 
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See Also newff ，newcf，traingdm，traingda，traingdx，trainlm，trainrp， 
traincgf，traincgb，trainscg，trainoss，trainbfg 


References Scales, 工 . 卫 .,，7atroqQzxctio7 to Nom-Linear Optazza 如 o1, New York: 
Springer-Verlag, 1985. 
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PurPpose 


Syntax 


Descripfion 
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Gradient descent backpropagation 


[net,TR,Ac,E1L] = traingd(net,Pd,T1,Ali,Q,TS,VV ,TV) 


info = traingd(code) 


traingd is a network training function that updates weight and bias values 
according to gradient descent. 


traingd(net,Pd,T1L1,Ai,Q,TS,VV) takes these inputs， 
net - Neural network. 
Pd  - Delayed input vectors. 
T1  - Layer target vectors. 
Ai - Initial input delay conditions. 
Q “ - Batch size. 
TS - Time steps. 
vv - Either an empty matrix [] or a structure ofvalidation vectors， 
TV - Empty matrix [] or structure of test vectors. 


and returns， 


net - Trained network. 
TR - Training record of various values over each epoch: 
TR.epoch -了 poch number. 
TR.perf -Training performance. 
TR.vperf -Validation performance. 
TR.tperf - Test performance.， 


Ac  - Collective layer outputs for last epoch. 
E1  ， - Layer errors for last epoch. 
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Training occurs according to the traingd's training parameters Shown here 
with their default values: 


net.trainParam.epochs 10 
net.trainParam.goal 0 
net ,trainParam.,Ir 0.01 
net ,trainParam.max fail 5 


net.trainParam,.min_ grad 1e-10 


net .trainParam.show 25 


net .trainParam.time Inf 


Dimensionsgs for these variables are: 


Dij = 


Maximum number of epochs to train 
Performance goal 

Learning rate 

Maximum validation failures 
Minimum performance gradient 

卫 pochs between showing progressg 


Maximum time to train in SecondSgs 


No XNiXTScell array, each element P{i,j,ts}is aDijxQmatrix. 


N1 XTS cell array, each element P{i,ts}yis anViXxQmatrix. 


N1L X LD cell array, each element Ai{i,klyis an SiXxQmatrix. 


net . 
net . 
net . 
net . 
net . 
net . 


numInputs 
numLayers 
numLayerDelays 
inputs{Ii}.sSize 
1ayers{Ii}.sSize 
targets{i}.Size 


Ri * length(net.inputWeights{i,j}.delays) 


IfVvVvor TV is not [], 直 must be astructure ofvalidqation vectors， 


VV.PD， 
VV.T1， 
VV.Ai， 


VV.Q， 


VV.TS， 


TV.Q 


TV.PD - Validation/test delayed inputs. 

TV.T1L -Validation/test layer targets. 

TV.Ai - Validation/test initial input conditions. 
-Validation/test batch size. 

TV.TS -Validation/test time steps. 


Validation vectors are used to stop training early 计 the network performance 
on the validation vectors fails to imnprove or remains the same for max_fail 
epochs in a row. Test vectors are used as a further check that the network is 
generalizing well, but do not have any effect on training. 
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Network Use 


Algorithm 


See Also 
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traingd(code) returns useful information for each code string: 


"pnames' - Names oftraining parameters. 
"pdefaults' - Default training parameters， 


You can create a standard network that uses traingd with newff, newcf, or 
newelnm. 


To prepare a custom network to be trained with traingd: 

1 Set net.trainFcnto 'traingd'. This will set net.trainParam to traingd's 
default parameters, 

2 Setnet.trainParam properties to desired values. 


In either case, calling train with the resulting network will train the network 
with traingd. 


See newff, newcf, and newelm for examples. 


traingd can train any network as long as its weight, net input, and transfer 
functions have derivative fanctions. 


Backpropagation is used to calculate derivatives of performance perf with 
respect to the weight and bias variables X. 了 ach variable is adqjusted according 
to gradient descent: 


dX = JIFrx*x dperf/dX 
Training stops when any of these conditions occurT: 


e The maximum number of epochs (repetitions) is reached. 
e The maximum amount of time has been exceeded. 

se Performance has been minimized to the goal. 

e The performance gradient falls below mingrad. 


e Validation performance has increased more than max_fail times since the 
last time it decreased (when using validation). 


newff ，newcf，traingdm，traingda，traingdx，trainlnm 


traingda 





Purpose 


Synftax 


Descripfion 


Gradient descent with adaptive learning rate backpropagation 


[net,TR,Ac,E1L] = traingda(net,Pd,T1,Ali,Q,TS,VV,TV) 

info = traingda(code) 

traingda is a network training function that updates weight and bias values 
according to gradient descent with adaptive learning rate. 
traingda(net,Pd,T1,Ai,Q,TS,VV) takes these inputs， 


net - Neural network. 


Pd - Delayed input vectors. 

T1  - Layer target vectors. 

Ai - Initial input delay conditions. 

Q “ - Batch size. 

TS - Time steps. 

vv - Either empty matrix [] or structure ofvalidation vectors， 
TV - Empty matrix [] or structure of test vectors. 


and returns， 


net - Trained network. 
TR - Training record of various values over each epoch: 
TR.epoch - Epoch number. 
TR.perf  - Training performance. 
TR.vperf - Validation performance. 
TR.tperf -_ Test performance. 
TR.Ir - Adaptive learning rate. 
Ac - Collective layer outputs for last epoch. 
E1  - Layer errors for last epoch. 
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Training occurs according to the traingdas training parameters, shown here 
with their default values: 


net . 
net . 
net . 
net . 
net . 
net . 
net . 
net . 


net 


net . 


trainParam 
trainParam 


trainParanm . 


trainParam 


trainParanm . 


trainParam 


trainParanm . 
trainParanm . 
,trainParanm . 
traiznParanm . 


.epochs 10 
.go0al 0 
工 0.01 
. 坟 _ inc 1.05 
1r_ dec 0.7 
.max_Tfail 5 
max_perf_inc 1.04 
min_ grad 1e-10 
Show 25 
廿 ime Inf 


Dimensions for these variables are: 


Dij 


Maximum number ofepochs to train 
Performance goal 

Learning rate 

Ratio to increase learning rate 
Ratio to decrease learning rate 
Maximum validation failures 
Maximum performance increase 
Minimum performance gradient 

卫 pochs between showing progress 


Maximum time to train in Secondsg 


No XNiXTScell array, each element P{i,j,ts}isaDijxQmatrix. 


N1L XTS cell array, each element P{i,ts}yis aVixQmatrix. 


N1 XLD cell array, each element Ai{fi,klyis an SixQmatrix. 


net.numInputs 
net.numLayers 
net.numLayerDelays 
net.inputs{I}.sSize 
net.1ayers{Ii}.Size 
net.targets{i}.Size 


= RI * length(net.inputWeights{I,j}.dqelays) 


IfVvVvor TV is not [], 达 mustbe astructure ofvalidation vectors， 


VV.PD， 
VV.T1， 
VV.Ai， 
VV.Q， 

VV.TS， 


TV.PD - Validation/test delayed inputs. 

TV.T1L - Validation/test layer targets. 

TV.Ai - Validation/test initial input conditions， 
TV.Q - Validation/test batch Size. 

TV.TS - Validation/test time steps. 
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Network Use 


Algorithm 


Validation vectors are used to stop training early 计 the network performance 
on the validation vectors fails to imnprove or remains the same for max_fail 
epochs in a row. Test vectors are used as a further check that the network is 
generalizing well, but do not have any effect on training. 


traingda(code) returns useful information for each code string: 


"pnames' - Names oftraining parameters. 
"pdefaults' - Default training parameters. 


You can create a standard network that uses traingda with newff, newcf,， or 
newelnm. 


To prepare a custom network to be trained with traingda: 


1 Setnet.trainFcnto'traingda.This will set net.trainParam totraingdas 
default parameters， 
2 Setnet.trainParam properties to desired values. 


In either case, calling train with the resulting network will train the network 
with traingda. 


See newff, newcf, and newelm for examples. 
traingda can train any network as long as its weight, net input, and transfer 
functions have derivative fanctions. 


Backpropagation is used to calculate derivatives of performance dperf with 
respect to the weight and bias variables X. 了 ach variable is adqjusted according 
to gradient descent: 


dxX = LIrx*dperfy/dX 


At each epoch, 让 performance decreasegs towardthe goal, then thejlearningrate 
is increased by the factor LIr_inc. Iperformance increases by more than the 
factor max_perf_inc,the learningrate is adqjusted bythe factor IFr_dec and the 
change, which increased the performance, is not made. 


Training stops when any of these conditions occurs: 


e The maximum number of epochs (repetitions) is reached. 


e The maximum amount of time has been exceeded. 
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See Also 
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se Performance has been minimized to the goal. 
e The performance gradient falls below mingrad. 


e。 Validation performance has increased more than max_fail times since the 
last time it decreased (when using validation). 


newff ，newcf，traingd，traingdm，traingdx，trainlm 
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Purpose 


Synftax 


Descripfion 


Gradient descent with momentum backpropagation 


[net,TR,Ac,E1L] = traingdm(net,Pd,T1,Ali,Q,TS,VV,TV) 

info = traingdm(code) 

traingdm is a network training function that updates weight and bias values 
according to gradient descent with momentum. 
traingdm(net,Pd,T1,Ai,Q,TS,VV) takes these inputs， 


net - Neural network. 


Pd - Delayed input vectors. 
T1  - Layer target vectors， 
Ai - Initial input delay conditions. 


Q “ - Batch size. 

TS - Time steps. 

VV ” - Either empty matrix [] or structure ofvalidation vectors， 
TV -了 mpty matrix [] or structure of test vectors， 


and returns， 


net - Trained network. 
TR - Training record of various values over each epoch: 
TR.epoch - 了 Epoch number. 
TR.perf  - Training performance， 
TR.vperf - Validation performance. 
TR.tperf -_ Test performance. 
Ac - Collective layer outputs for last epoch. 


E1  - Layer errors for last epoch. 
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Training occurs according to the traingdms training parameters shown here 
with their default values: 


net 


,trainParam.epochs 


net . 


net . 


net . 


net . 


net . 


net . 


net . 


trainParanm . 
trainParanm . 
trainParanm . 
traiznParanm . 
trainParanm . 
trainParanm . 
trainParanm . 


10 
goal 0 
工 F 0.01 
max_fail 5 
mcC 0.9 
min_ grad 1e-10 
Show 25 
十 Ime Inf 


Dimensionsgs for these variables are: 


Maximum number ofepochs to train 
Performance goal 

Learning rate 

Maximum validation failures 
Momentum constant. 

Minimum performance gradient 

卫 pochs between showing progress 


Maximum time to train in Secondsg 


No XNiXTScell array, each element P{i,j,ts}isaDijxQmatrix. 


N1L XTS cell array, each element P{i,ts}is aVixQmatrix. 


N1 XLD cell array, each element Ai{fi,klyis an SixQmatrix， 


net ， 
net ， 
net ， 
net ， 
net.1Layers 
net.target 


Inputs 


numInputs 
numLayers 
numLayerDelays 


{I}+ .Size 
{I} .Size 
S{Ii}.Size 


Dij = Ri * length(net.inputWeights{i,j}.delays) 


IfVvVvor TV is not [], 直 must be astructure ofvalidation vectors， 


VV.PD， 
VV.T1， 
VV.Ai， 
VV.Q， 

VV.TS， 


TV.PD - Validation/test delayed inputs. 

TV.T1L - Validation/test layer targets, 

TV.Ai - Validation/test initial input conditions 
TV.Q - Validation/test batch Size. 

TV.TS - Validation/test time steps. 


Validation vectors are used to stop training early ifthe network performance 
on the validation vectors fails to imnprove or remains the same for max_fail 


traingdm 





epochs in a row. Test vectors are used as a further check that the network is 
generalizing well, but do not have any effect on training. 


traingdm(code) returns useful information for each code string: 


"pnames' - Names oftraining parameters. 
"pdefaults' - Default training parameters. 


Network Use You can create a standard network that uses traingdm with newff, newcf,， or 
neweJnm. 
To prepare a custom network to be trained with traingdm: 


1 Setnet.trainFcnto'traingdm'.This will set net.trainParam totraingdm's 
default parameters, 
2 Setnet.trainParam properties to desired values. 


In either case, calling train with the resulting network will train the network 
with traingdm. 


See newff, newcf, and newelm for examples. 
Algorithm traingdm can train any network as long as its weight, net input, and transfer 
functions have derivative fuanctions. 


Backpropagation is used to calculate derivatives of performance perf with 
respect to the weight and bias variables X. 了 ach variable is adqjusted according 
to gradient descent with momentum， 


dX = mcx*dXxprev + JPFx(1-mc)xdperf/dX 
where dXprev is the previous change to the weight or bias, 
Training stops when any ofthese conditions occurT: 
e The maximum number of epochs (repetitions) is reached. 
e The maximum amount oftime has been exceeded. 
e Performance has been minimized to the goal1. 


e The performance gradient falls below mingrad. 


e Validation performance has increase more than max_fail times since the 
last time it decreased (when using validation). 
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See Also newff ，newcf，traingd，traingda，traingdx，trainlm 
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Purpose 


Synftax 


Descripfion 


Gradient descent with momentum and adaptive learning rate backpropagation 


[net,TR,Ac,E1L] = traingdx(net,Pd,T1,Ali,Q,TS,VV,TV) 

info = traingdx(code) 

traingdx is a network training function that updates weight and bias values 
according to gradient descent momentum and an adaptive learning rate. 
traingdx(net,Pd,T1,Ai,Q,TS,VV) takes these inputs， 


net - Neural network. 


Pd - Delayed input vectors. 
T1  - Layer target vectors， 
Ai - Initial input delay conditions. 


Q “ - Batch size. 

TS - Time steps. 

VV ” - Either empty matrix [] or structure ofvalidation vectors， 
TV -了 mpty matrix [] or structure of test vectors， 


and returns， 


net - Trained network. 
TR - Training record of various values over each epoch: 
TR.epoch - 了 Epoch number. 
TR.perf  - Training performance， 
TR.vperf - Validation performance. 
TR.tperf -_ Test performance. 
TR.Ir  ， - Adaptive learning rate. 
Ac  - Collective layer outputs for last epoch. 
E1  ， - Layer errors for last epoch. 
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Training occurs according to the traingdxs training parameters shown here 
with their default values: 


net . 
net . 
net . 
net . 
net . 
net . 
net . 
net . 
net . 


net 


net . 


trainParam 


trainParanm . 
trainParanm . 
trainParanm . 
trainParanm . 
trainParanm . 
trainParanm . 
trainParanm . 
trainParanm . 
,trainParanm . 
traiznParanm . 


.epochs 10 
goal 0 
工 0.01 
Trinc 1.05 
1 dec 0.7 
max_Tfail 5 
max_perf_inc 1.04 
mc 0.9 
min_grad 1e-10 
Show 25 
十 ime Inf 


Dimensions for these variables are: 


Maximum number ofepochs to train 
Performance goal 

Learning rate 

Ratio to increase learning rate 
Ratio to decrease learning rate 
Maximum validation failures 
Maximum performance increase 
Momentum constant. 

Minimum performance gradient 

卫 pochs between showing progress 


Maximum time to train in Secondsg 


No XNiXTScell array, each element P{i,j,ts}yisaDijxQmatrix. 


N1L XTS cell array, each element P{i,ts}yis aVixQmatrix. 


N1 XLD cell array, each element Ai{fi,klyis an SixQmatrix. 


net.numInputs 


.numLayers 

.numLayerDelays 
.Inputs{Ii}y.Size 
.ayers{Ii}.Size 
.targetSs{i}.Size 
* length(net.inputWeights{Ii,j}.delays) 
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Network Use 


Algorithm 


IfVvVvor TV is not [], 直 mustbe astructure ofvalidation vectors， 


VV.PD，TV.PD - Validation/test delayed inputs, 
VV.TL1，TV.T1 - Validation/test layer targets. 
VV.Ai，TV.Ai - Validation/test initial input conditions. 
vvV.Q，TV.Q - Validation/test batch size. 
VV.TS，TV.TS - Validation/test time steps. 


Validation vectors are used to stop training early 计 the network performance 
on the validation vectors fails to imnprove or remains the same for max_fail 
epochs in a row. Test vectors are used as a further check that the network is 
generalizing well, but do not have any effect on training. 


traingdx(code) return useful information for each code string: 


"pnames' - Names oftraining parameters. 
"pdefaults' - Default training parameters. 


You can create a standard network that uses traingdx with newff, newcf,， or 
neweJnm. 
To prepare a custom network to be trained with traingdx: 


1 Setnet.trainFcnto'traingdx.This will set net.trainParam totraingdx"s 
default parameters， 


2 Setnet.trainParam properties to desired values. 


In either case, calling train with the resulting network will train the network 
with traingdx. 


See newff, newcf, and newelm for examples. 
traingdx can train any network as long as its weight, net input, and transfer 
fanctions have derivative functions. 


Backpropagation is used to calculate derivatives of performance perf with 
respect to the weight and bias variables X. 了 ach variable is adqjusted according 
to gradient descent with momentum， 


dX = mcxdXxprev + Jrx*mcxdperfy/dX 


where dXprev is the previous change to the weight or bias, 
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For each epoch, 这 performance decreases toward the goal, then the learning 
rate is increased by the factor LIr_inc. Iperformance increases by more than 
the factor max_perf_inc,the learningrate is adqjusted bythefactor LIr_dec and 
the change, which increased the performance, is not made. 


Training stops when any of these conditions occurT: 


e The maximum number of epochs (repetitions) is reached. 
e The maximum amount of time has been exceeded. 

se Performance has been minimized to the goal. 

e The performance gradient falls below mingrad. 


e Validation performance has increase more than max_fail times since the 
last time it decreased (when using validation). 


newff ，newcf，traingd，traingdm，traingda，trainlm 
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Purpose 


Syntax 


Descripfion 


Levenberg-Marquardt backpropagation 


[net,TR] = trainlm(net,Pd,T1,Ali,Q,TS,VV TV) 
info = trainlm(code) 


trainlm is anetwork training function that updates weight and bias values 
according to Levenberg-Marquardt optimization. 
trainlm(net,Pd,T1,Ai,Q,TS,VV,TV) takes these inputs， 


net - Neural network. 
Pd - Delayed input vectors. 


T1 ， - Layer target vectors. 

Ai - Initial input delay conditions, 

Q “ - Batch size. 

TS - Time steps. 

VV ” - Either empty matrix [] or structure ofvalidation vectors， 
TV - Either empty matrix [] or structure of validation vectors， 


and returns， 


net - Trained network. 
TR - Training record of various values over each epoch: 
TR.epoch - 了 Epoch number. 
TR.perf  - Training performance. 
TR.vperf - Validation performance. 
TR.tperf - _ Test performance. 
TR .mu - Adaptive mu value. 


14-309 


frainlm 





Training occurs according to the trainlms training parameters Shown here 
with their default values: 


net.trainParam.epochs 100 Maximum number of epochs to train 
net.trainParam.goal 0 Performance goal 
net.trainParam.max_fail 5 Maximum validation failures 
net.trainParam.mem_reduc 1 Factor to use for memory/speed 
tradeof 仁 
net.trainParam.min grad 1e-10 Minimum performance gradient 
net.trainParam.mu 0.001 _ Initial Mu 
net.trainParam.mu dec 0.1 Mu decrease factor 
net.trainParam.mu_inc 10 Mu increase factor 
net.trainParam.mu_max 1e10 Maximum Mu 
net.trainParam.show 25 “了 卫 pochs between showing progressg 
net.trainParam.time inf Maximum time to train in seconds 





Dimensions for these variables are: 


Pd - No XNiXTScell array, each element P{i,j,ts}yis aDijxQmatrix. 
T1 - NL XTS cell array, each element P{i,ts}is aVixQmatrix. 
Ai - N1 XLDcell array, each element Ai{fi,kl is an SixXxQmatrix. 


NI = net.numInputs 

N1L = net.numLayers 

LD = net.numLayerDelays 

Ri = net.inputs{i}.sSize 

Si = net.1Layers{fi}.Size 

Vi = net.targets{Ii}.sSize 

Dij = Ri * length(net.inputWeights{i,j}.delays) 
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Network Use 


Algorithm 


IfVvVvor TVis not [], 直 must be a structure of vectors， 


VV.PD，TV.PD - Validation/test delayed inputs. 
VV.TL1，TV.T1 - Validation/test layer targets. 
VV.Ai，TV.Ai - Validation/test initial input conditions. 
vvV.Q，TV.Q - Validation/test batch size. 
VV.TS，TV.TS - Validation/test time steps. 


Validation vectors are Used to stop training early 计 the network performance 
on the validation vectors fails to imnprove or remains the same for max_fail 
epochs in a row. Test vectors are used as a further check that the network is 
generalizing well, but do not have any effect on training. 


trainlm(code) returns useful information for each code string: 


"pnames' - Names oftraining parameters. 
"pdefaults' - Default training parameters. 


You can create a standard network that uses trainlm with newff, newcf,， or 
newelnm. 


To prepare a custom network to be trained with trainlm: 

1 Set net.trainFcnto 'trainlm'. This will set net.trainParam to train1lm?s 
default parameters， 

2 Setnet.trainParam properties to desired values. 


In either case, calling train with the resulting network will train the network 
with trainlm. 


See newff, newcf, and newelm for examples. 


trainlm can train any network as long as its weight, net input, and transfer 
fanctions have derivative functions. 


Backpropagation is used to calculate the Jacobian jXofperformance perf with 
respect to the weight and bias variables X. 了 ach variable is adqjusted according 
to Levenberg-Marquardt， 

jj = jx * jX 

je = jx *E 

dx = -(jj+Ix*mu) \ je 
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where E is all errors and I is the identity matrix， 


The adaptive value mu is increased by mu_inc until the change above results in 
areduced performance value. The change is then made to the network and mu 
is decreased by mu_dec， 


The parameter mem_reduc indicates how to use memory and speed to calculate 
the Jacobian jX. Imem_reduc is 1, then trainlmruns the fastest, but can 
require a lot of memory. Increasing mem_reduc to 2 cuts some of the memory 
requjired by a factor oftwo, but slows trainlm somewhat. Higher values 
continue to decrease the amount of memory needed and increase training 
tmes. 


Training stops when any of these conditions occurT: 


e The maximum number of epochs (repetitions) is reached. 
e The maximum amount of time has been exceeded. 

e Performance has been minimized to the goal. 

e The performance gradient falls below mingrad. 

e mu exXceeds mu_max. 


e Validation performance has increased more than max_fail times since the 
last time it decreased (when using validation). 


newff ，newcf，traingd，traingdm，traingda，traingdx 
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Purpose 


Synftax 


Descripfion 


One step secant backpropagation 


[net,TR,Ac,E1L] = trainoss(net,Pd,T1,Ali,Q,TS,VV,TV) 

info = trainoss(code) 

trainoss is a network training function that updates weight and bias values 
according to the one step secant method. 
trainoss(net,Pd,T1,Ai,Q,TS,VV,TV) takes these inputs， 


net - Neural network. 


Pd - Delayed input vectors. 

T1 ， - Layer target vectors. 

Ai - Initial input delay conditions. 

Q “ - Batch size. 

TS - Time steps. 

VV ” - Either empty matrix [] or structure ofvalidation vectors, 
TV - Either empty matrix [] or Structure of test vectors， 


TV -了 ither empty matrix [] or structure of test vectors. 


and returns， 


net - Trained network. 
TR - Training record of various values over each epoch: 
TR.epoch - 了 Epoch number. 
TR.perf  - Training performance. 
TR.vperf - Validation performance. 
TR.tperf -_ Test performance. 
Ac - Collective layer outputs for last epoch. 
E1  - Layer errors for last epoch. 
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Training occurs according to the trainosss training parameters, shown here 
with their default values: 


net.trainParam.epochs 100 ” Maximum number of epochs to train 
net.trainParam.show 25 “了 卫 pochs between showing progressg 
net.trainParam.goal 0 “Performance goal 
net.trainParam.time inf Maximum time to train in seconds 


net.trainParam.min grad 1e-6 Minimum performance gradient 


net.trainParam.max_fail 5 Maximum validation failures 
net .trainParam.searchFcn Name of line search routine to use 
'Srchcha， 


Parameters related to line search methods (not all used for all methods): 


net.trainParam.Scal tol 20 
Divide into delta to determine tolerance for linear Search. 


net.trainParam.alpha 0.001 
Scale factor, which determines sufGcient reduction in perf. 


net.trainParam.beta 0.1 
Scale factor, which determines sufficiently large step size. 


net .trainParam.delta 0.01 
Initial step size in interval location step, 


net.trainParam.gama 0.1 


Parameter to avoid small reductions in performance. Usually set to 0.1. 
(See use in srch_cha.) 


net.trainParam.1Low 1im 0.1 Lower limit on change in step size. 
net.trainParam.up_1Lim 0.5 Upper limit on change in step size. 
net.trainParam.maxstep 100 Maximum step length. 
net.trainParam.minstep 1.0e-6 Minimum step length. 
net.trainParam.bmax 26 Maximum step size. 
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Dimensionsgs for these variables are: 


Pd - NoXNiXTScellarray, each element P{i,j,ts}yisaDijxQmatrix. 
T1 - NLXTS cell array, each element P{i,tsl is aVixQmatrix. 
Ai - NILXLDcell array, each element Aifi,kyis an SixQmatrix. 


NI = net.numInputs 

N1L = net.numLayers 

LD = net.numLayerDelays 

RiI = net.inputs{Ii}.sSize 

Si = net.1Layers{fi}.Size 

Vi = net.targets{fi}.sSize 

Dij = Ri * length(net.inputWeights{i,j}.delays) 
IfVVvis not [],it must be a structure of validation vectors， 


VV.PD - Validation delayed inputs. 
VV.T1L - Validation layer targets. 
vvV.Ai - Validation initial input conditions. 
vv.Q - Validation batch size. 
VV.TS - Validation time steps. 
which is used to stop training early 这 the network performance on the 


validation vectors fails to imnprove or remaings the same for max_fail epochs in 
a Tow， 


IfTVis not [],itmust be a structure ofvalidation vectors， 
TV.PD - Validation delayed inputs. 
TV.T1L - Validation layer targets. 
TV.Ai - Validation initial input conditions, 
TV.Q - Validation batch size. 
TV.TS - Validation time steps. 


which is used to test the generalization capability ofthe trained network. 
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Neftwork Use 
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trainoss(code) returns useful information for each code string: 


'pnames' - Names of training parameters. 
'pdefaults' - Default training parameters， 


Here is a problem consisting of inputs p and targets t that we would like to 
solve with a network. 


p= [012345]; 
t= [000111]; 


Here a two-layer feed-forward network is created. The network's input ranges 
位 om [0 to 10]. The first layer has two tansigneurons, and the second layer 
has one 1ogsig neuron. The trainoss network training function is to be used. 


Create and Test a Network 
net = newff([0 5],[2 1],{ tansig ，1ogsig +，trainoss ) ; 
a= Sim(net,p) 


Train and Retest the Network 


net.trainParam.epochs = 50 
net.trainParam.Show = 10; 
net.trainParam.goal = 0.1; 
net = train(net,p,t) ; 

a= Sim(net,p) 


See newff，newcf, and newelm for other examples. 
You can create a standard network that uses trainoss with newff，newcf, or 
newelnm. 


To prepare a custom network to be trained with trainoss: 


1 Setnet.trainFcnto trainoss'.This will set net.trainParam to trainoss?S 
default parameters, 
2 Setnet.trainParam properties to desired values. 


In either case, calling train with the resulting network will train the network 
with trainoss. 
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Algorithm 


See Also 


References 


trainoss can train any network as long as its weight, net input, and transfer 
functions have derivative fanctions. 


Backpropagation is used to calculate derivatives of performance perf with 
respect to the weight and bias variables X. 了 ach variable is adqjusted according 
to the following: 


X=X+ axrdx; 


Where dX is the search direction. The parameter a is selected to minimize the 
performance along the search direction. The line search function searchFcn is 
used to locate the minimum point. The first search direction is the negative of 
the gradient of performance. In succeeding iterations the search direction is 
computed 位 om the new gradient and the previous steps and gradients 
according to the following formula: 


dX = -gX + AcxX Step + BcxdgX; 


Where gX is the gradient, X_step is the change in the weights on the previous 
iteration, and dgX is the change in the gradient from the last iteration. See 
Battiti (Veral Comzpxtatiom) for a more detailed discussion ofthe one step 
secant algorithm. 


Training stops when any ofthese conditions occurT: 


e The maximum number of epochs (repetitions) is reached. 
e The maximum amount of time has been exceeded. 

se Performance has been minimized to the goal1. 

e。 The performance gradient falls below mingrad. 


e。 Validation performance has increased more than max_fail times since the 
last time it decreased (when using validation). 


newff ，newcf，traingdm，traingda，traingdx，trainlm，trainrp， 
traincgf，traincgb，trainscg，traincgp，trainbfg 


Battiti, “First and second order methods for learning: Between steepest 


descent and Newton's method,”Nevral Comzpxtatiom, vol. 4, no. 2, pp. 141--166， 
1992. 
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PurPpose 


Syntax 


Descripfion 
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Random order incremental training with learning fanctions， 


[net,TR,Ac,E1] = trainr(net,Pd,T1,Ai,Q,TS,VV TV) 


info = trainr(code) 


trainr is not called directly. Instead it is called by train for networks whose 
net .trainFcn property ls set to 'trainr '. 


trainr trains anetwork with weight and bias learning rules with incremental 
updates after each presentation of an input. Inputs are presented in random 
order. 


trainr(net,Pd,T1,Ai,Q,TS,VV) takes these inputs， 


net - Neural network. 

Pd  - Delayed inputs. 

T1  - Layer targets. 

Ai - Initial input conditions. 
Q “ - Batch size. 

TS - Time steps. 

VV “ - Ignored. 

TV - Ignored. 


and returns， 
net - Trained network. 
TR - Training record of various values over each epoch: 


TR.epoch - Epoch number. 
TR.perf  - Training performance.， 
Ac - Collective layer outputs. 


E1  - Layer errors. 
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Network Use 


Training occurs according to trainrs training parameters shown here with 
their default values: 


net.trainParam.epochs 100 Maximum number ofepochs to train 


net.trainParam.goal 0 “Performance goal 

net.trainParam.show 25 “Epochs between displays (NaN for no 
displays) 

net.trainParam.time inf Maximum time to train in seconds 


Dimensions for these variables are: 
Pd - No x Ni x TSecellarray,eachelement Pdfi,j,ts}yisaDij x Qmatrix. 
T1 - NL x TS cell array, each element P{fi,tsyig8avi x Qmatrzxor 中 . 


Ai - N1 x LD cell array, each element Aifi,kyis anSi x Qmatrix. 


NI = net.numInputs 

N1L = net.numLayers 

LD = net.numLayerDelays 

RiI = net.inputs{i}.sSize 

Si = net.1Layers{fi}.Size 

Vi = net.targets{fIi}.sSize 

Dij = Ri * length(net.inputWeights{i,j}.delays) 
trainr does not imnplement validation or test vectors, so arguments VV and TV 
are jgnored. 
trainr(code) returns useful information for each code string: 


'pnames， - Names oftraining parameters. 

'pdefaults' - _ Default training parameters， 
You can create a standard network that uses trainr by calling newc or newsonm， 
To prepare a custom network to be trained with trainr: 


1 Set net.trainFcn to 'trainr '. 


(IThis will set net.trainParam to trainrs default parameters.) 


2 Set each net.inputWeights{fi,j}.learnFcnto alearning function. 
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3 Seteach net.layerWeights{fi,j}.learnFcnto alearning function. 


4 Seteach net.biases{fi}.1learnFcntoalearning function. (Weight and bias 
learning parameters will automatically be set to default values for the given 
learning fanction.) 


To train the network: 


1 Set net.trainParam properties to desired values， 
2 Set weight and bias learning parameters to desired values, 
3 Call train. 


See newc and newsom for training examples. 
For each epoch, all training vectors (or sequences) are each presented once in 


adifferent random order with the network and weight and bias values updated 
accordingly after each individual presentation. 


Training stops when any ofthese conditiongs are met: 


e The maximum number of epochs (repetitions) is reached. 
se Performance has been minimized to the goal. 


e The maximum amount of time has been exceeded. 


newp ，newlin，train 
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Purpose 


Synftax 


Descripfion 


Resilient backpropagation 


[net,TR,Ac,E1L] = trainrp(net,Pd,T1,AliQ,TS,VV ,TV) 

info = trainrp(code) 

trainrp is a network training function that updates weight and bias values 
according to the resilient backpropagation algorithm (RPROP). 
trainrp(net,Pd,T1,Ai,Q,TS,VV,TV) takes these inputs， 


net - Neural network. 


Pd - Delayed input vectors. 

T1  - Layer target vectors. 

Ai - Initial input delay conditions. 

Q “ - Batch size. 

TS - Time steps. 

vv - Either empty matrix [] or structure ofvalidation vectors， 
TV - Either empty matrix [] or Structure of test vectors， 


and returns， 


net - Trained network. 
TR - Training record of various values over each epoch: 
TR.epoch - Epoch number. 
TR.perf  - Training performance. 
TR.vperf - Validation performance. 
TR.tperf - _ Test performance. 
Ac - Collective layer outputs for last epoch. 
E1  - Layer errors for last epoch. 
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Training occurs according to the trainrp's training parameters shown here 
with their default values: 


net.trainParam.epochs 100 Maximum number of epochs to train 
net.trainParam.show 25 “了 卫 pochs between showing progressg 
net.trainParam.goal 0 “Performance goal 
net.trainParam.time inf Maximum time to train in seconds 


net.trainParam.min grad 1e-6 Minimum performance gradient 
net.trainParam.max_fail 5 Maximum validation failures 
net.trainParam.Ir 0.01 Learning rate 
net.trainParam.delt_ inc 1.2 JIncrement to weight change 
net.trainParam.delt dec 0.5 Decrement to weight change 
net.trainParam.delta0 0.07 JInitial weight change 
net.trainParam.deltamax 50.0 Maximum weight change 


Dimensionsgs for these variables are: 


Pd - NoXNiXxTScellarray,each element P{i,j,tsyisaDijxQmatrix. 
T1 - NLXTScell array, each element P{i,tsl is aVixXxQmatrix， 
Ai - NLXLDcell array, each element Ai{fi,klyis an SixXxQmatrix. 


NI = net.numInputs 

N1L = net.numLayers 

LD = net.numLayerDelays 

RiI = net.inputs{i}.sSize 

Si = net.1Layers{fi}.Size 

Vi = net.targets{i}.Ssize 

Dij = Ri * length(net.inputWeights{i,j}.delays) 
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Examples 


IfVvVvis not [],it must be a structure of validation vectorSs， 


vVV.PD - Validation delayed inputs. 

VV.T1L - Validation layer targets. 

vvV.Ai - Validation initial input conditions, 
vvV.Q - Validation batch size. 

VV.TS - Validation time steps. 


which is used to stop training early 让 the network performance on the 
validation vectors fails to imnprove or remaings the same for max_fail epochs in 
a TOW. 


IfTVis not [],itmust be a structure ofvalidation vectors， 


TV.PD - Validation delayed inputs. 
TV.T1L - Validation layer targets. 
TV.Ai - Validation initial input conditions. 
TV.Q - Validation batch size. 
TV.TS - Validation time steps. 
which is used to test the generalization capability ofthe trained network. 


trainrp(code) returns useful information for each code string: 


"pnames' - Names oftraining parameters. 
"pdefaults' - Default training parameters. 


Here is a problem consisting of inputs p and targets t that we would like to 
Solve with a network. 


p=[012345] 
t= [000111]; 


Here a two-layer feed-forward network is created. The network's input ranges 
位 om [0 to 10]. The first layer has two tansigmneurons, and the second layer 
has one 1ogsig neuron. The trainrp network training function is to be used. 


Create and Test a Network 


net = newff([0 5],[2 1],{ 人 tansig ，1ogsig +，trainrp ) ; 
a= Simn(net,p) 
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Train and Retest the Network 
net.trainParam.epochs = 50 
net.trainParam.Show = 10; 
net.trainParam.goal = 0.1; 
net = train(net,p,t) ; 
a= Sim(net,p) 


See newff，newcf, and newelm for other examples. 


You can create a_ standard network that uses trainrp with newff，newcf, or 
newelnm. 


To prepare a custom network to be trained with trainrp: 


1 Set net.trainFcnto 'trainrp'. This will set net.trainParam to trainrp?s 
default parameters, 


2 Setnet.trainParam properties to desired values. 


In either case, calling train with the resulting network will train the network 
with trainrp. 


trainrp can train any network as long as its weight, net input, and transfer 
fanctions have derivative functions. 


Backpropagation is used to calculate derivatives of performance perf with 
respect to the weight and bias variables X. 了 ach variable is aqjusted according 
to the following: 


dX = deltax.*Sign(gX); 


where the elements of deltax are all initialized to delta0 and gX is the 
gradient. At each iteration the elements of deltaXx are modified. Ifan element 
of gx changes sign 们 om one iteration to the next, then the corresponding 
element ofdeltaX is decreased by delta_dec. Ifan elementofgx maintains the 
same sign 位 om one iteration to the next, then the corresponding element of 
deltax is increased by delta_inc. See Reidmiller and Braun, Proceeqazimnss of 
tjAe 7 五 有 ETteratozal Co7jfere1zce om Veral NetzorpEs. 


Training stops when any of these conditions occurT: 


e The maximum number of epochs (repetitions) is reached. 


frainrp 





e The maximum amount of time has been exceeded. 
se Performance has been minimized to the goal1. 
e The performance gradient falls below mingrad. 


e Validation performance has increased more than max_fail times since the 
last time it decreased (when using validation). 


See Also newff ，newcf，traingdm，traingda，traingdx，trainlm，traincgp， 
traincgf，traincgb，trainscg，trainoss，trainbfg 


References Riedmiller, M., and 志 . Braun,“A direct adaptive method for faster 


backpropagation learning: The RPROP algorithm,”Proceeqzi71ss of 1Pe 7 五 及 玉 
112terza 砌 oaL Co7mjferemce om Veral! NetzuorpEs, San Francisco,1993. 
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Sequential order incremental training w/learning functiongs 


[net,TR,Ac,E1] = trains(net,Pd,T1,Ai,Q,TS,VV, TV) 


info = trains(code) 


trains is not called directly. Instead it is called by train for networks whose 
net .trainFcn property ls set to 'trains '. 


trains trains a network with weight and bias learning rules with sequential 
updates. The sequence of inputs is presented to the network with updates 
occurTring after each time step. 


This incremental training algorithm is commonly used for adaptive 
applications. 


trains takes these inputs: 


net - Neural network. 
Pd - Delayed inputs. 
T1  - Layer targets. 
Ai - Initial input conditions. 
Q “ - Batch size. 
TS - Time steps. 
VV  - Ignored. 
TV - Ignored. 
and after training the network with its weight and bias learning functions 
returns: 
net - Updated networkK. 
TR  - Training record. 
TR.time steps - Number oftime steps. 
TR.perf - performance for each time step. 
Ac - Collective layer outputs. 
E1  - Layer errors. 
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Network Use 


Training occurs according to trains's training parameter shown here with its 
default value: 
net.trainParam.passes 1 _ Number oftimes to present sequence 


Dimensionsgs for these variables are: 


Pd - No x NixTS cell array, each element P{i,j,ts}yisazZzij x Qmatrix. 
T1 - N1L x TS cell array, each element P{fi,tsyisan Vi x Qmatrixor |. 
AL - N1L Xx 
Ac - NL x (LD+TS) cell array, each element Ac{fi,kyis an Si x Qmatrix. 
E1 - N1L Xx 


LD cell array, each element Aifi,ky is anSi x Q matrix. 


TS cell array, each element El{i,kyisanSi x Qmatrxor 申 . 


NI = net.numInputs 

N1L = net.numLayers 

LD = net.numLayerDeJays 

Ri = net.inputs{i}.sSize 

Si = net.1Layers{fi}.Size 

Vi = net.targets{Ii}.sSize 

Zij = RiI * length(net.inputWeights{I,j}.dqelays) 
trains(code) returns useful information for each code string: 

"pnames - Names oftraining parameters. 

"pdefaults' - Defaul training parameters. 


You can create a standard network that uses trains for adapting by calling 
newp or newlin. 


To prepare a custom network to adapt with trains: 


1 Set net.adaptFcn to trains'. 
(IThis will set net.adaptParam to trains's default parameters.) 


2 Set each net.inputWeights{fi,j}.learnFcnto alearning function. 
3 Set each net.1LlayerWeights{fi,jj.learnFcn to a learning function. 


4 Seteach net.biases{fi}.1learnFcntoalearning function. (Weight and bias 
learning parameters will automatically be set to default values for the given 
learning function.) 
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To allow the network to adapt: 


1 Set weight and bias learning parameters to desired values. 
2 Call adapt. 


See newp and newlin for adaption examples. 


卫 ach weight and bias is updated according to its learning function after each 
time step in the input sequence. 


newp，newlin，train，trainb，trainc，trainr 
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Scaled conjugate gradient backpropagation 


[net,TR,Ac,E1L] = trainscg(net,Pd,T1,Ali,Q,TS,VV,TV) 

info = trainscg(code) 

trainscg is a network training function that updates weight and bias values 
according to the scaled conjugate gradient method. 
trainscg(net,Pd,T1,Ai,Q,TS,VV,TV) takes these inputs， 


net - Neural network. 


Pd - Delayed input vectors. 

T1 ， - Layer target vectors. 

Ai - Initial input delay conditions. 

Q “ - Batch size. 

TS - Time steps. 

VV ” - Either empty matrix [] or structure ofvalidation vectors， 
TV - Either empty matrix [] or Structure of test vectors， 


and returns， 


net - Trained network. 
TR - Training record of various values over each epoch: 
TR.epoch - 了 Epoch number. 
TR.perf  - Training performance， 
TR.vperf - Validation performance. 
TR.tperf -_ Test performance. 
Ac - Collective layer outputs for last epoch. 
E1  - Layer errors for last epoch. 
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Training occurs according to the trainscg's training parameters shown here 
with their default values: 


net.trainParam.epochs 100 ” Maximum number of epochs to train 
net.trainParam.show 25 “了 卫 pochs between showing progressg 
net.trainParam.goal 0 “Performance goal 
net.trainParam.time inf Maximum time to train in seconds 


net.trainParam.min grad 1e-6 Minimum performance gradient 
net.trainParam.max_fail 5 Maximum validation failures 


net.trainParam.sigma 5.0e-5 Determines change in weight for 
second derivative approximation. 


net.trainParam.1Lambda 5.0e-7 。 Parameter for regulating the 
indefiniteness ofthe Hessian. 


Dimensions for these variables are: 


Pd - NoXNiXxTScellarray, each element P{i,j,ts}isaDijxQmatrix. 
T1 - NLXTScell array, each element P{i,ts}yis aVixQmatrix. 
Ai - NILXLDcell array, each element Aifi,kyis an SixQmatrix. 


NI = net.numInputs 

N1L = net.numLayers 

LD = net.numLayerDelays 

RiI = net.inputs{i}.Size 

Si = net.1Layers{fi}.Size 

Vi = net.targets{i}.size 

Dij = Ri * length(net.inputWeights{i,j}.delays) 
IfVvVvis not [],it mustbe a structure of validation vectors， 


vvV.PD - Validation delayed inputs. 

vvV.T1L - Validation layer targets. 

vvV.Ai - Validation initial input conditions. 
vvV.Q - Validation batch size, 

VV.TS - Validation time steps. 
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which is used to stop training early ifthe network performance on the 
validation vectors fails to imnprove or remaings the same for max_fail epochs in 
a TOW. 


IfTVis not [],itmust be a structure of validation vectors， 


TV.PD - Validation delayed inputs. 
TV.T1L - Validation layer targets. 
TV.Ai - Validation initial input conditions, 
TV.Q - Validation batch size. 
TV.TS - Validation time steps. 
which is used to test the generalization capability ofthe trained network. 


trainscg(code) returns useful information for each code string: 


"pnames' - Names of training parameters. 
"pdefaults' - Default training parameters. 


Here is a problem consisting of inputs p and targets t that we would like to 
Solve with a network. 


p= [012345] 
t= [000111]; 


Here a two-layer feed-forward network is created. The network's input ranges 
位 om [0 to 10]. The first layer has two tansigmneurons, and the second layer 
has one 1ogsig neuron. The trainscg network training function is used. 


Create and Test a Network 


net = newff([0 5],[2 1],{tansig ，1ogsig }，trainscg ) ; 
a= Simn(net,p) 


Train and Retest the Network 


net.trainParam.epochs = 
net .trainParam.Show = 1 
net.trainParam.goal = 0 
net = train(net,p ,七 ) ; 
a= Sim(net,p) 


50 ; 
0 
.1 


See newff, newcf, and newelm for other examples. 
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You can create a standard network that uses trainscg with newff, newcf,， or 
newelnm. 


To prepare a custom network to be trained with trainscg: 

1 Setnet.trainFcnto trainscg.This will set net.trainParamto trainscg?S 
default parameters. 

2 Setnet.trainParam properties to desired values. 


In either case, calling train with the resulting network will train the network 
with trainscg. 


trainscg can train any network as long as its weight, net input, and transfer 
functions have derivative functions. Backpropagation is used to calculate 
derivatives of performance perf with respect to the weight and bias variables 
X. 


The scaled conjugate gradient algorithm is based on conjugate directions, as in 
traincgp, traincgf and traincgb, but this algorithm does not perform a line 
Search at each iteration. See Moller (Vervral! Netzuorps) for a more detailed 
discussion ofthe scaled conjugate gradient algorithm. 


Training stops when any of these conditions occurT: 


e The maximum number of epochs (repetitions) is reached. 
e The maximum amount of time has been exceeded. 

se Performance has been minimized to the goal. 

e The performance gradient falls below mingrad. 


e。 Validation performance has increased more than max_fail times since the 
last time it decreased (when using validation). 


newff ，newcf，traingdm，traingda，traingdx，trainlm，trainrp， 
traincgf，traincgb，trainbfg，traincgp，trainoss 


Moller, M. FE.,“A scaled conjugate gradient algorithm for fast supervised 
learning,”Nervral NetzorRs, vol. 6, pp. 525-533, 1993. 
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Transform data using a precalculated minimum and maximum value 
[PN] = tramnmx(P,minp,maxp) 


tramnmx transforms the network input set using minimum and maximum 
values that were previously computed by premnmx. This fanction needs to be 
used when a network has been trained using data normalized by premnmx. All 
Subsequent inputs to the network need to be transformed using the same 
normalization, 


tramnmx(P,minp，maxp)takes these inputs 


P  - RxQmatrxofinput(column) vectors. 

minp- R X 1 vector containing original minimums for each input， 

maxp- R X 1 vector containing original maximums for each input. 
and returns， 


PN - RXQmatrxofnormalized input vectors 


Here is the code to normalize a given data set, so that the inputs and targets 
will fall in the range [-1,1], using premnmx, and also code to train a network 
with the normalized data. 


p= [-10 -7.5 -5 -2.5 02.557.5 10]; 
t= [07.07 -10 -7.07 07.07 10 7.07 0]; 
[pn,minp,maxp,tnmint,maxt] = premnmx(p ,七 ) ; 


net = newff(minmax(pn),[5 1],{f tansig purelin' }，trainlm' ) ; 
net = train(net,pntn) ; 


If we then receive new inputs to apply to the trained network, we will use 
tramnmx to transform them first. Then the transformed inputs can be used to 
simnulate the previously trained network. The network output must also be 
unnormalized using postmnmx. 

p2= [4 -7]; 

[p2n] = tramnmx(p2,minp ,maxp ) ; 

an = Sim(net,pn); 

[al = postmnmx(an,mint,maxt) ; 


pn=2x*(p-minp)/(maxp-minp) - 1 


premnmx，prestd，prepca，trastd，trapca 
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Principal component transformation 
[Ptrans] = trapca(P,transMat ) 


trapca preprocesses the network input training set by applying the principal 
component transformation that was previously computed by prepca. This 
fanction needs to be used when a network has been trained using data 
normalized by prepca. All subsequent inputs to the network need to be 
transformed using the same normalization,， 


trapca(P,transMat) takes these inputs， 


P - RXQmatrix ofcentered input (column) vectors. 
transMat -_ Transformation matrix. 


and returns， 


Ptrans - _ Transformed data set. 


Here is the code to perform a principal component analysis and retain only 
those components that contribute more than two percent to the variance in the 
data set. prestd is called first to create zero mean data, which is needed for 
prepca. 


p=[-1.5 -0.58 0.21 -0.96 -0.79; -2.2 -0.87 0.31 -1.4 -1.2]; 
t= [-0.08 3.4 -0.82 0.69 3.1] | 

[pn,meanp,stdp,tnmeant ,stdt] = prestd(p, 七 ) ; 

[ptrans,transMat] = prepca(pn,0.02) ; 

net = newff(minmax(ptrans),[5 1],{ tansig purelin' }，trainlm' ) ; 
net = train(net,ptranstn) ; 


H 开 we then receive new inputs to apply to the trained network, we will use 
trastd and trapca to transform them first. Then the transformed inputs can 
be used to simulate the previously trained network. The network output must 
also be unnormalized using poststd. 


p2= [1.5 -0.8;0.05 -0.3] 

[p2n] = trastd(p2,meanp ,stdp ) ; 
[p2trans] = trapca(p2ntransMat ) 
an = Slim(net,p2trans ) ; 

[al = poststd(an,meant ,stdt) ; 
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Algorithm Ptrans = transMatxP; 


See Also prestd，premnmx，prepca，trastd，tramnmx 
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Preprocess data using a precalculated mean and standard deviation 
[PN] = trastd(P,meanp,stdp) 


trastd preprocesses the network training set using the mean and standard 
deviation that were previously computed by prestd. This function needs to be 
used when a network has been trained using data normalized by prestd. All 
subsequent inputs to the network need to be transformed using the same 
normalization 


trastd(P,T) takes these inputs， 


P - RXQmatrx ofinput (column) vectors. 
meanp - R X 1vector containing the original means for each input. 
stdp - R xf1vector containing the original standard deviations for each 
input. 
and returns， 


PN - RXQmatrix ofnormalized input vectors. 


Here is the code to normalize a given data set so that the inputs and targets 
will have means of zero and standard deviations of |. 


p= [-0.92 0.73 -0.47 0.74 0.29; -0.08 0.86 -0.67 -0.52 0.931] ; 
t= [-0.083.4 -0.82 0.69 3.1]; 

[pn,meanp,stdp,tn,meant ,stdt] = prestd(p, 七 ) ; 

net = newff(minmax(pn),[5 1],{ tansig ” purelin'}，trainlm ' ) ; 
net = train(net,pnytn) ; 


HIwe then receive new inputs to apply to the trained network, we will use 
trastd to transform them first. Then the transformed inputs can be used to 
simulate the previously trained network. The network output must also be 
unnormalized using poststd. 


p2= [1.5 -0.8;0.05 -0.3] 
[p2n] = trastd(p2,meanp ,stdp ) ; 
an = Sim(net,pn); 

[al = poststd(an,meant ,stdt) ; 


pn = (p-meanp)/stdp; 


premnmx，prepca，prestd，trapca，tramnmx 
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Triangular basis trangsfer function 





Q = 1ripas(11) 


Triangular Basis Function 


A = tribas(N) 

info = tribas(code) 

tribas is atransferfunction.Transfer fanctions calculate alayers output 位 om 
its net input. 

tribas(N) takes one input， 


N - SXxQmatrix ofnet input (column) vectors. 
and returns each element of N passed through a radial basis function . 
tribas(code) returns useful information for each code string: 
"deriv -Name ofderivative fanction. 
mame'” -了 ull name. 
"output' - Output range. 
"active' - Active input range. 


Here we create a plot of the tribas transfer function. 


n = -5:0.1:5; 
a = tribas(n) 
plot(ny,al) 


To change a network so that a layer uses tribas, Set 
net.1Layers{i}.transferFcnto tribas'. 
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Call sim to simulate the network with tribas. 


Algorithm tribas(N) calculates its output with according to: 


tribas(n) = 1-abs(n), 计 -1 <=mn<=1; = 0, otherwise. 


See Also sim，radbas 
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Convert vectors to indices 

ind = Vec2ind(vec) 

ind2vec and vec2ind allow indices to be represented either by themselves or 
as vectors containing al in the row ofthe index they represent. 


vec2ind(vec) takes one argument， 


vec - Matrix of vectors, each containing a Single 1 
and returns the indices ofthe 1S. 


Here four vectors (each containing only one “1”element) are defined and the 
indices ofthe ls are found. 


vec= [1000;0010;0101] 
ind = vec2ind(vec) 


Ind2vec 
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ADALINE - An acronym for a linear neuron: ADAptive LINear Element. 


adaption -Atraining method that proceeds through the specified sequence of 
inputs, calculating the output, error and network adjustment for each input 
vector in the sequence as the inputs are presented. 


adaptive learning rate -Alearning rate that is adjusted according to an 
algorithm during training to minimize training time. 


adaptive filter - Anetwork that contains delays and whose weights are 
adjusted after each new input vector is presented. The network “adapts”to 
changes in the input signal properties isuch occur. This kind of filter is used 
in long distance telephone lines to cancel echoes, 


architecture - A description of the number of the layers in a neural network， 
each layers transfer function, the number ofneurons per layer, and the 
connections between layers, 


backpropagation learning rule -Alearning rule in which weights and biases 
are adjusted by erTror-derivative (delta) vectors backpropagated through the 
network. Backpropagation is commonly applied to feedforward multilayer 
networks. Sometimes this rule is called the generalized delta rule. 


backtracking search - Linear search routine that begins with a step 
multiplier of 1 and then backtracks until an acceptable reduction in the 
performance is obtained. 


batch - Amatrix of input (or target) vectors applied to the network 
“Simultaneously.”Changes to the network weights and biases are made just 
once for the entire set of vectors in the input matrix. (This term is being 
replaced by the more descriptive expression“concurrent vectors.”) 


batching - The process of presenting a set of input vectors for Simultaneous 
calculation ofa matrix of output vectors and/or new weights and biases. 


Bayesian framework - Assumes that the weights and biases of the networK 
are random variables with specified distributions. 


BEFGS quasi-Nevwton algorithm - A variation ofNewton's optimization 
algorithm, in which an approximation ofthe Hessian matrix is obtained 位 om 
gradients computed at each iteration ofthe algorithm . 


bias - Aneuron parameter that is summed with the neuron'"s weighted inputs 
and passed through the neuron's transfer function to generate the neuron'”s 
output. 





bias vector - A column vector of bias values for a layer of neurons. 


Brent's search -人 A linear search that is a hybrid combination ofthe golden 
Section Search and a quadratic interpolation. 


Charalambous' search - Ahybrid line search that uses a cubic interpolation， 
together with a type of sectioning. 


cascade forward network - A layered network in which each layer only 
receives inputs 人 om previous layers, 


classification - An association of an input vector with a particular target 
Vector. 


competitive layer - A layer of neurons in which only the neuron with 
maximum net input has an output ofl and all other neurons have an output of 
0. Neurons compete with each other for the right to respond to a given input 
Vector. 


competitive learning - The unsupervised training of a competitive layer with 
the instar rule or ohonen rule. Individual neurons learn to become feature 
detectors. After training, the layer categorizes input vectors among its 
neurons. 


competitive transfer function - Accepts a net input vector for a layer and 
returns neuron outputs of 0 for all neurongs except for the“winner,”the neuron 
associated with the most positive element ofthe net input mn， 


concurrent input vectors - Name given to a matrix of input vectors that are 
to be presented to anetwork "simultaneously.”All the vectors in the matrix will 
be used in making just one set of changes in the weights and biases. 


conjugate gradient algorithm - In the conjugate gradient algorithms a search 
is performed along conjugate directions, which produces generally faster 
convergence than a search along the steepest descent directions. 


connection - A one-way link between neurons in a networK. 


connection strength - The strength of a link between two neurons in a 
Detwork. The strength, often called weight, determines the effect that one 
neuron has on anotherT. 


cycle -A single presentation of an input vector, calculation of output, and new 
weights and biases， 


A 人 -3 


人 Glossary 





dead neurons - A competitive layer neuron that never won any competition 
during training and sohas not become auseful feature detector. Dead neurons 
do not respond to any of the training Vectors. 


decision boundqary -人 A line, determined by the weight and bias vectors, for 
which the net input 7 is zero. 


delta rule - See the Widrow-Hoff learning rule. 


delta vector - The delta vector for a layer is the derivative of a network”S 
output error with respect to that layers net input vector. 


distance - The distance between neurons, calculated from their positions with 
a distance function. 


distance function - Aparticular way of calculating distance, such as the 
了 uclidean distance between two vectors. 


early stopping -Atechniquebased on dividing the data intothree subsets.The 
first Subset is the training set used for computing the gradient and updating 
the network weights and biases. The second subset is the validation set. When 
the validation erTror increases for a specified number of iterations, the training 
is stopped, and the weights and biases at the minimum ofthe validation error 
are returned. The third subset is the test set. It js used to verify the network 
design. 


epoch - The presentation ofthe set oftraining (input and/or target) vectors to 
anetwork and the calculation ofnew weights and biases. Note that training 
vectors can be presented one at atime or all together in a batch. 


error jumping - A sudden increase in anetwork's Sum-squared error during 
training. This is often due to too large a learning rate. 


error ratio - Atraining parameter used with adaptive learning rate and 
momentum training of backpropagation networks. 


error vector - The difference between anetwork's output vector in response to 
an input vector and an associated target output vector. 


feedqdback network - Anetwork with connections fom a layer's output to that 
layergs input. The feedback connection can be direct or pass through several 
]ayers， 


feedqforward network - A layered network in which each layer only receives 
inputs from previous layers. 





Fletcher-Reeves update - A method developed by Fletcher and Reeves for 
computing a set of conjugate directions. These directions are used as search 
directions as part of a conjugate gradient optimization procedure. 


function approximation - The task performed by a network trained to 
respond to inputs with an approximation of a desired function. 


generalization - An attribute ofanetwork whose output for anew input vector 
tends to be close to outputs for similar input vectors in its training set. 


generalized regression network - Approximates a continuous function to an 
arbitrary accuracy, given a sufficient number of hidden neurons， 


global minimum - The lowest value of a function over the entire range of its 
input parameters. Gradient descent methods adjust weights and biases in 
order to find the global minimum of error for a network. 


golden section search - A linear search that does not require the calculation 
of the slope. The interval containing the minimum of the performance is 
Subdivided at each iteration ofthe search, and one subdivision is eliminated at 
each iteration , 


gradient descent - The process of making changes to weights and biases， 
where the changes are proportional to the derivatives of network error with 
respect to those weights and biases. This is done to minimize network error. 


hardq-limit transfer function - Atransfer that maps inputs greater-than or 
equal-to 0 to 1, and all other values to 0. 


Hebb learning rule - Historically the first proposed learning rule for neurons, 
Weights are adjusted proportional to the product of the outputs of pre- and 
post-weight neurons. 


hiqdqen layer -Alayer of a network that is not connected to the network 
output. (For instance, the first layer of a two-layer feedforward network.) 


home neuron - Aneuron at the center of a neighborhood. 


hybriq bisection-cubicsearch - A line search that combines bisection and 
cubic interpolation. 


input layer - A layer ofneurons receiving inputs directly 位 om outside the 
Detwork. 


initialization - The process of setting the network weights and biases to their 
original values. 
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input space - The range of all possible input vectors. 

input vector - A_ vector presented to the network. 

input weights - The weights connecting network inputs to layers. 
input weight vector - The row vector of weights going to a neuron. 


Jacobian matrix - Contains the first derivatives of the network errors with 
respect to the weights and biases. 


Kohonen learning rule -Alearning rule that trains selected neuron's weight 
Vectors to take on the values of the current input vector， 


layer - A group ofneurons having connections to the same inputs and sending 
outputs to the same destinations， 


layer diagram - A network architecture figure showing the layers and the 
weight matrices connecting them. 也 ach layers transfer function is indicated 
with a symbol. Sizes of input, output, bias and weight matrices are shown. 
Indqividual neurons and connectiongs are not shown. (See Chapter 2.) 


layer weights - The weights connecting layers to other layers. Such weights 
need to have non-zero delays ifthey form a recurrent connection (i.e., a loop). 


learning - The process by which weights and biases are adjusted to achieve 
some desired network behavior. 


learning rate -Atraining parameter that controls the Size of weight and bias 
changes during learning. 


learning rules - Methods of deriving the next changes that might be made in 
anetwork OR a procedure for modifying the weights and biases of a networkK. 


Levenberg-Marquardt - An algorithm that trains a neural network 10 to 100 
faster than the usual gradient descent backpropagation method. It will always 
compute the approximate Hessian matrix, which has dimensions 九 -by-7. 


line search function - Procedure for searching along a given search direction 
(ine) to locate the minimum of the network performance, 


linear transfer function - Atransfer fanction that produces its input as its 
outpnut. 


link distance - The number oflinks, or steps, that must be taken to get to the 
neuron under consideration. 





local minimum - The minimum of a function over a limited range ofinput 
values. A local minimum may not be the global minimum. 


log-sigmoid transfer function -A squashing function ofthe form shown below 
that maps the input to the interval (0,1). CThe toolbox function is 1ogsig.) 


工 
工 +e7 


帮 z) = 





Manhattan distance - The Manhattan distance between two vectors Xand y is 
calculated as: 


D = Sum(abs(x-y)) 


maximum performance increase - The maximum amount by which the 
performance is allowed to increase in one iteration ofthe variable learning rate 
training algorithm. 


maximum step size - The maximum step size allowed during a linear search. 
The magnitude ofthe weight vector is not allowed to increase by more than thigs 
maximum step Size in one iteration of a training algorithm. 


mean square error function - The performance fanction that calculates the 


average Squared erTror between the network outputs a and the target outputs t. 


momentum - A technique often used to make 直 less likely for a 
backpropagation networks to get caught in a shallow minima. 


momentum constant -Atraining parameter that controls how much 
“momentum”is used. 


mu parameter - The initial value for the scalar 几 . 


neighborhoodqd - A group ofneurons within a specified distance of a particular 
neuron. The neighborhood is specified by the indices for all ofthe neurons that 
lie within a radius w ofthe winning neuron 态 : 


net input vector - The combination, in alayer,ofall the layers weighted input 
Vectors with its bias. 


neuron - The basic processing element of a neural network. Includes weights 
and bias, a summing junction and an output transfer function. Artificial 


A-7 


人 Glossary 





A-8 


neurons, Such as those Simulated and trained with this toolbox, are 
abstractiongs of biological neurons, 


neuron diagram - Anetwork architecture figure showing the neurons and the 
weights connecting them. 也 ach neuron”s transfer fonction is indicated with a 
Symbol. 


ordering phase - Period oftraining during which neuron weights are expected 
to order themselves in the input space consistent with the associated neuron 
positions. 


output layer -Alayer whose output is passed to the world outside the network. 


output vector - The output of a neural network. ach element of the output 
vector is the output of a neuron. 


output weight vector - The column vector of weights coming 位 om a neuron or 
input. (See outstar learning rule.) 


outstar learning rule -人 A learning rule that trains a neuron's (or input's) 
output weight vector to take on the values of the current output vector of the 
post-weight layer. Changes in the weights are proportional to the neuron?s 
outpnut. 


overfitting - A case in which the error on the training set is driven to a very 
small value, but when new data is presented to the network, the error is large. 


pass - 也 ach traverse through all of the training input and target Vectors. 
pattern - A_ vector. 


pattern association - The task performed by a network trained to respond 
with the correct output vector for each presented input vector. 


pattern recognition - The task performed by a network trained to respond 
when an input vector close to a learned vector is presented. The network 
“recognizes”the input as one ofthe original target Vectors. 


performance function - Commonly the mean squared error of the network 
outputs. However, the toolbox also considers other performance functions， 
Type nnets and look under performance fanctions. 


perceptron - A single-layer network with a hard-limit transfer function. This 
Detwork is often trained with the perceptron learning rule. 





perceptron learning rule -Alearningrule for training single-layerhard-limit 
networks. It is guaranteed to result in a perfectly functioning network in finite 
time, given that the network is capable of doing so. 


performance - The behavior of a network. 


Polak-Ribik6re update - A method developed by Polak and Ribikre for 
computing a set of conjugate directions. These directions are used as search 
directions as part of a conjugate gradient optimization procedure. 


positive linear transfer function - Atransfer function that produces an 
output of zero for negative inputs and an output equal to the input for positive 
inputs. 


postprocessing - Converts normalized outputs back into the same units that 
were used for the original targets. 


了 Powell-Beale restarts - A method developed by Powell and Beale for 
computing a set of conjugate directions. These directiongs are used as search 
directions as part of a conjugate gradient optimization procedure. This 
procedure also periodically resets the search direction to the negative ofthe 
gradient. 


preprocessing - Perform some transformation ofthe input or target data 
before it is presented to the neural network. 


principal component analysis - Orthogonalize the components of network 
input vectors. This procedure can also reduce the dimension of the input 
vectors by eliminating redundant components. 


quasi-Nevwton algorithm - Class of optimization algorithm based on Newton's 
method. An approximate Hessian matrix is computed at each iteration of the 
algorithm based on the gradients. 


radial basis networks - Aneural network that can be designed directly by 
fitting special response elements where they will do the most good. 


radial basis transfer function - The transfer function for aradial basis 
neuron 1S: 


FraQpas() = e“ 
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regularization - Involves modifying the performance function, which is 
normally chosen to be the sum ofsquares of the network errors on the training 
set, by adding some fraction ofthe squares of the network weights. 


resilient backpropagation -Atraining algorithmthateliminates the harmful 
effect ofhaving a small slope at the extreme ends of the sigmoidqd “squashing” 
transfer functions. 


saturating linear transfer function -A function that is linear in the interval 
(-1,+1) and saturates outside this interval to -1 or +1. (The toolbox function is 
sat1lin.) 


scaled conjugate gradqdient algorithm -Avoids thetime consuming line search 
of the standard conjugate gradient algorithm. 


sequential input vectors - A set of vectors that are to be presented to a 
network “one after the other.”The network weights and biases are adjusted on 
the presentation of each input vector. 


sigma parameter - Determines the change in weight for the calculation ofthe 
approximate Hessian matrix in the scaled conjugate gradient algorithm 


sigmoid - Monotonic S-shaped function mapping numbers in the interval 
(-co,co) to a finite interval such as (-1,+1) or (0,1). 


simulation - Takes the network input p, and the network object net, and 
returns the network outputs a. 


spread constant -The distance an input vector must be 位 om aneuron's weight 
Vector to produce an output of 0.5. 


squashing function -Amonotonic increasing function that takes input values 
between -co and +ce and returns values in a finite interval. 


star learningrule -Alearningrulethattrains aneuron”s weight vectorto take 
on the values of the current input vector. Changes in the weights are 
proportional to the neuron”s output. 


sum-squared error - The sum ofsquared differences between the network 
targets and actual outputs for a given input vector or set of vectors. 


supervised learning - A learning process in which changes in a network”S 
weights and biases are due to the intervention of any external teacher. The 
teacher typically provides output targets. 





symmetric hard-limit transfer function - Atransfer that maps inputs 
greater-than or equal-to 0 to +l, and all other values to -1. 


symmetric saturating linear transfer function - Produces the input as its 
output as long as the input iin the range -1to 1.Outside thatrange the output 
is -1 and +1 respectively. 


tan-sigmoid transfer function - A squashing function of the form shown 
below that maps the input to the interval (-1,1). (The toolbox function is 
tansig.) 
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tapped delay line - A sequential set of delays with outputs available at each 
delay outpnut. 


target vector - The desired output vector for a given input vector. 


test vectors - A set ofinput vectors (not used directly in training) that is used 
to test the trained network. 


topology functions - Ways to arrange theneurons in a grid, box, hexagonal, or 
random topology. 


training - A procedure whereby a network is adqjusted to do a particular job. 
Commonly viewed as an “offline”job, as opposed to an adjustment made during 
each time interval as is done in adaptive training. 


training vector - An input and/or target vector used to train a networK. 


transfer function - The function that maps a neuron's (or layers) net output 
n to its actual output. 


tuning phase - Period ofSOFM training during which weights are expected to 
Spread out relatively evenly over the input space while retaining their 
topological order found during the ordering phase. 


underdetermined system - A system that has more variables than 
congstralints. 


unsupervised learning - A learning process in which changes in a network's 
weights and biases are not due to the intervention of any external teacher. 
Commonly changes are a function ofthe current network input vectors, output 
Vectors, and previous weights and biases. 
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update - Make a change in weights and biases. The update can occur after 
presentation of a single input vector or after accumulating changes over 
SeVeral input vectors. 


validation vectors -A set of input vectors (not used directly in training) that 
is used to monitor training progress So as to keep the network 位 om overfitting. 


weighted input vector - The result of applying a weight to a layer's input， 
whether it is a network input or the output of another layer. 


weight function - Weight functions apply weights to an input to get weighted 
inputs as specified by a particular function. 


weight matrix - A matrix containing connection strengths 位 om a layer's 
inputs to its neurons. The element wij of a weight matrix W refers to the 
connection strength 他 om input j to neuron i 


Widrow-Hoff learning rule - A learning rule used to trained single-layer 
linear networks. This rule is the predecessor of the backpropagation rule and 
is Sometimes referred to as the delta rule. 
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Block Set 


The Neural Network Toolbox provides a set of blocks you can use to build 
neural networks in Simulink or which can be used by the function gensim to 
generate the Simulink version of any network you have created in MATLAB. 


Bring up the Neural Network Toolbox blockset with this command. 


neural 


Theresultis awindow thatcontaings three blocks. 了 Each ofthese blocks contains 
additional blocks. 


[ET -io 


File Edit Wiew Formast Help 


TmnsferFunctions Net Input Functions Weight Functipns 


Contmol System 


NeurmslNetwork Toolbox Blbpck Libmrm 
Copyrght 1992.2001 The hiathwWorks, Inc. 





Transfer Funcfiion Blocks 


Double-click on the Transfer Functions block in the Neural window to bring up 
a window containing several transfer function blocks. 
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卫 ach of these blocks takes a net input vector and generates a _ corresponding 
output vector whose dimensiongs are the same as the input vector. 


Neft Inpuft Blocks 


Double-click on the Net Input Functions block in the Neural windovw to bring 
up a window containing two net-input function blocks. 


[ET -| >| 





File Edit  wiew Formasc 
Help 
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卫 ach ofthese blocks takes any number of weighted input vectors, weight layer 
output vectors, and bias vectors, and returns a net-input vector. 


Weight Blocks 


Double-click on the Weight Functions block in the Neural window to bring up 
a window containing three weight function blocks. 
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卫 ach ofthese blocks takes a neuron's weight vector and applies it to an input 
Vector (or a layer output vector) to get a weighted input value for a neuron. 


It is important to note that the blocks above expect the neuron's weight vector 
to be defined as a column vector. This is because Simulink signals can be 
column vectors, but cannot be matrices or Tow VectorSs. 


It is also important to note that because ofthis limitation you have to create S 
weight function blocks (one for each rowj, to imnplement a weight matrix going 
to a layer with S neurons. 
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This contrasts with the other two kinds of blocks. Only one net input function 
and one transfer function block are required for each layer. 


Control Systems Blocks 


Double-click on the Control Systems block in the Neural window to bring up a 
window containing four control systems blocks. 
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Chapter 6“Control Systems”describes the application ofthese control systems 
blocks. 
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Block Generafion 


The function gensim generates block descriptions ofnetworks So you cam 
simulate them in Simulink. 


gensim(net st) 


The second argumentto gensim determines the sample time, which is normally 
chosen to be some positive real value. 


If a network has no delays associated with its input weights or layer weights， 
this value can be set to -1.Avalue of -ltells gensimto generate anetwork with 
continuous Sampling. 


Example 
Here is a simple problem defining aset ofinputs p and corresponding targets t. 


p= [12345]; 
t= [13579]; 


The code below designs a linear layer to solve this problem. 


net = newlind(p,t) 
We can test the network on our original inputs with sinm. 
y = Sim(net,p) 
The results returned show the network has solved the problem. 


y 二 
1.0000 3.0000 5.0000 7.0000 9.0000 


Call gensim as follows to generate a Simulink version of the network. 
gensim(net,-1) 

The second argument is -1 so the resulting network block samples 

continuous]y. 


The call to gensim results in the following screen. It contains a Simulink 
System consisting of the linear network connected to a sample input and a 
SCOpe. 
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Constant 
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The input block is actually a standard Constant block. Change the constant 
value from the initial randomly generated value to 2, and then select Close. 


Select Start 位 om the Simulation menu. Simulink momentarily pauses as 让 
simulates the System. 


When the simulation is over, double-click the scope at the right to see the 
following display of the network's response. 


Block Generation 























YN} 


吾 
万 别 剧 量 国 说 








Note that the output is 3, which is the correct output for an input of 2. 


Exercises 
Here are a couple of exercises you camn try. 


Changing Input Signal 

Replace the constant input block with a signal generator from the standard 
Simulink block set Sources. Simulate the system and view the network'”s 
Tesponse， 


Discrete Sample Time 


Recreate the network, but with a discrete sample time of 0.5, instead of 
continuous Sampling. 


gensim(net,0.5) 


Again replace the constant input with a signal generator. Simulate the system 
and view the network's response. 
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Dimensions 


The following code dimensions are used in describing both the network signals 
that user's commonly see, and those used by the utility fanctions: 


Ni = number ofnetwork inputs = net.numInputs 


Ri = number ofelements in inputi = net.inputs{ij.size 


N]1 = number oflayers = net.numLayers 


Si = number ofneurons in layer 1 = net.layers{fil.size 


Nt = number of targets = net.numTargets 


Vi =number ofelements in target ij equal to 9j, where j is the ith layer with a 
target. (A layer n has a target 让 net.targetsn) == 1.) 


No = number ofnetwork outputs = net.numOutputs 


Ui = number ofelements in output ji, equal to 9Sj, where j is the ith layer with 
an output (A layer n has an output 让 net.outputs(Cn) == 1.) 


ID = number ofinput delays = net.numInputDelays 


LD = number of layer delays = net.numLayerDelays 


TS = number oftime steps 


Q =mnumber of concurrent vectors or SequUences. 


Variables 





Variables 


The variables a user commonly uses when defining a Simulation or training 
SesSlon are: 


了 

Network inputs. 

Ni-by-TS cell array, each element Pli,tsj is an Ri-by-Q matrix. 
Pi 

Initial input delay conditions. 

Ni-by-ID cell array, each element Pifikl is an Ri-by-Q matrix， 
Ai 

Initial layer delay conditions, 

Nl-by-LD cell array, each element Aifiklj is an Si-by-Q matrix， 
工 


Network targets. 
Nt-by-TS cell array, each element Pli,tsj is an Vi-by-Q matrix. 


These variables are returned by simulation and training 


calls: 


Network outputs. 
No-by-TS cell array, each element Yi,tsj is a Ui-by-Q matrix. 
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Network erTrors. 


Nt-by-TS cell array, each element Pli,tsj is an Vi-by-Q matrix. 


perf 


network performance 


Utility Funcftion Variables 
These variables are used only by the utility functions. 
Pc 
Combined inputs. 
Ni-by-(ID+TS) cell array, each element Pfitsj is an Ri-by-Q matrix. 


Pc = [Pi P] = Initial input delay conditions and network inputs. 


Pd 
Delayed inputs. 


Ni-by-Nj-by-TS cell array, each element Pdfij,tsj is an (Rix*IWD(UjJ))-by-Q 
matrix, where ITWD(ij) is the number ofdqelay taps associated with input 
weight to layer i fom input j. 


了 quivalently, TIWDGjj) = length(net.inputWeights{fijj.delays). 


Pd is the result of passing the elements of 了 through each input weights 
tap delay lines. Since inputs are always transformed by input delays in 
the same way it saves time to only do that operation once, instead of for 
every tralining step. 


B2 


Concurrent bias vectors. 
Nl-by-l cell array, each element BZfil is a Si-by-Q matrix. 


卫 ach matrix is Simply Q copies of the net.bfil bias vector. 


Variables 





IWZ 
Weighted inputs. 
Ni-by-Nl-by-TS cell array, each element ITWZfij,tsj is a Si-by--by-Q 


IatrlX. 
LWZ 


Weighed layer outputs. 
Ni-by-Nl-by-TS cell array, each element LWZ2lij,tsj is a Si-by-Q matrix. 


N 

Net inputs. 

Ni-by-TS cell array, each element Nti,tsj is a Si-by-Q matrix. 
A 

Layer outputs. 

N1l-by-TS cell array, each element Ali,tsj is a Si-by-Q matrix. 
Ac 

Combined layer outputs. 

Nl-by-(LLD+TS) cell array, each element Ali,tsj is a Si-by-Q matrix. 

Ac= [Ai Al = Initial layer delay conditions and layer outputs. 
TI 


Layer targets. 
N1l-by-TS cell array, each element Tlliitsj is a Si-by-Q matrix. 


Tl contains empty matrices [] in rows of layers inot associated with 
targets, indicated by net.targets(i) == 0. 
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Layer errors. 
Nl-by-TS cell array, each element Elfi,tsj is a Si-by-Q matrix. 
卫 ] contains empty matrices [] in rows of layers i not associated with 
targets, indicated by net.targets(i) == 0. 
又 


Column vector of all weight and bias values. 
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Funcfions 


The following functions are the utility functions that you can call to perform a 
lot of the work of simulating or training a network. You can read about them 
in their respective help comments. 


These functions calculate signals. 


calcpd, calca, calca1l, calce, Calce1, calcperf 


These functiongs calculate derivatives, Jacobians, and values associated with 
Jacobians. 


Calcgx, calcjx, calcjejj 


calcgx is used for gradient algorithms; calcjx and calcjejj can be used for 
calculating approximations ofthe Hessian for algorithms like 
Levenberg-Marquardt. 


These functions allow network weight and bias values to be accessed and 
altered in terms of a single vector 又 . 


SetXx, getXx, formx 
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Code Efficiency 


The functions sim, train, and adapt all convert a network object to a structure， 


net = Struct(net) ; 


before simulation and training, and then recast the Structure back to a 
Detwork. 


net = Class(net，network ') 
This is done for speed efciency since structure fieldqs are accessed directly， 
while object fieldqs are accessed using the MATLAB object method handling 


System. Ifusers write any code that uses utility functions outside of sim, train， 
or adapt, they should use the same technique. 


卫 -8 


Argument Checking 





Argument Checking 


These functions are only recommended for advanced users. 


None ofthe utility functions do any argument checking, which means that the 
only feedback you get 他 om calling them with incorrectly sized arguments is an 
eITOT. 


The lack of argument checking allows these functions to run as fast as possible. 


For “safer”simulation and training, use sim, train and adapt. 
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